[PATCH] Adjust testcase to avoid scan FIX in REG_EQUIV.

2024-10-15 Thread liuhongt
Also add hard_float target to avoid failed on arm-eabi, cortex-m0.

Verified on cross-compiler for powerpc64le-linux-gnu, sparc-sun-solaris2.11

Ready push to trunk.

gcc/testsuite/ChangeLog:

PR testsuite/115365
* gcc.dg/pr100927.c: Adjust testcase to avoid scan FIX in REG_EQUIV.
---
 gcc/testsuite/gcc.dg/pr100927.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr100927.c b/gcc/testsuite/gcc.dg/pr100927.c
index 8a7d69c3831..28a168d3518 100644
--- a/gcc/testsuite/gcc.dg/pr100927.c
+++ b/gcc/testsuite/gcc.dg/pr100927.c
@@ -1,7 +1,8 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
 /* { dg-options "-O2 -ftrapping-math -fdump-tree-optimized -fdump-rtl-final" } 
*/
 /* { dg-final { scan-tree-dump-times {(?n)= \(int\)} 3 "optimized" } }  */
-/* { dg-final { scan-rtl-dump-times {(?n)^[ \t]*\(fix:SI} 3 "final" } }  */
+/* { dg-final { scan-rtl-dump-times {(?n)^(?!.*REG_EQUIV)(?=.*\(fix:SI)} 3 
"final" } }  */
 
 int
 foo_ofr ()
-- 
2.31.1



Re: [PATCH 4/7] libstdc++: Remove indirection to __find_if in std::find etc.

2024-10-15 Thread Jonathan Wakely

On 15/10/24 15:20 +0100, Jonathan Wakely wrote:

Tested x86_64-linux.

-- >8 --

There doesn't seem to be a lot of benefit in reusing __find_if with
__gnu_cxx::__ops predicates, since they aren't going to actually
instantiate any less code if we use different predicates every time
(e.g. __ops::__negate, or __ops::__iter_equals_val, or
__ops::__pred_iter).

And now that std::find no longer calls __find_if (because it just does a
loop directly), we can make the _Iter_equals_val case of __find_if call
std::find, to take advantage of its memchr optimization. This benefits
other algos like search_n which use __find_if with _Iter_equals_val.


Hmm, I wonder if inverting the relationship below (__find_if calls
find, instead of vice versa) can cause an ABI problem if one TU has an
instantiation of the old find that calls __find_if, and another TU has
an instantiation of the new __find_if that calls find, and they end up
calling each other recursively until the stack overflows.

We might need to mangle one of them differently.


-  /**
-   *  @brief Find the first occurrence of a value in a sequence.
-   *  @ingroup non_mutating_algorithms
-   *  @param  __first  An input iterator.
-   *  @param  __last   An input iterator.
-   *  @param  __valThe value to find.
-   *  @return   The first iterator @c i in the range @p [__first,__last)
-   *  such that @c *i == @p __val, or @p __last if no such iterator exists.
-  */
-  template
-_GLIBCXX20_CONSTEXPR
-inline _InputIterator
-find(_InputIterator __first, _InputIterator __last, const _Tp& __val)
-{
-  // concept requirements
-  __glibcxx_function_requires(_InputIteratorConcept<_InputIterator>)
-  __glibcxx_function_requires(_EqualOpConcept<
-   typename iterator_traits<_InputIterator>::value_type, _Tp>)
-  __glibcxx_requires_valid_range(__first, __last);
-
-#if __cpp_if_constexpr && __glibcxx_type_trait_variable_templates
-  using _ValT = typename iterator_traits<_InputIterator>::value_type;
-  if constexpr (__can_use_memchr_for_find<_ValT, _Tp>)
-   if constexpr (is_pointer_v
-#if __cpp_lib_concepts
-   || contiguous_iterator<_InputIterator>
-#endif
-)
- {
-   // If conversion to the 1-byte value_type alters the value,
-   // it would not be found by std::find using equality comparison.
-   // We need to check this here, because otherwise something like
-   // memchr("a", 'a'+256, 1) would give a false positive match.
-   if (!(static_cast<_ValT>(__val) == __val))
- return __last;
-   else if (!__is_constant_evaluated())
- {
-   const void* __p0 = std::__to_address(__first);
-   const int __ival = static_cast(__val);
-   if (auto __n = std::distance(__first, __last); __n > 0)
- if (auto __p1 = __builtin_memchr(__p0, __ival, __n))
-   return __first + ((const char*)__p1 - (const char*)__p0);
-   return __last;
- }
- }
-#endif
-
-  return std::__find_if(__first, __last,
-   __gnu_cxx::__ops::__iter_equals_val(__val));
-}


[...]


+  // When the predicate is just comparing to a value we can use std::find,
+  // which is optimized to memchr for some types.
+  template
+_GLIBCXX20_CONSTEXPR
+inline _Iterator
+__find_if(_Iterator __first, _Iterator __last,
+ __gnu_cxx::__ops::_Iter_equals_val<_Value> __pred)
+{ return _GLIBCXX_STD_A::find(__first, __last, __pred._M_value); }
+
  template
_GLIBCXX20_CONSTEXPR
typename iterator_traits<_InputIterator>::difference_type
--
2.46.2





Re: [PATCH] tree-optimization/117138 - fix ICE with vector comparison in COND_EXPR

2024-10-15 Thread Andrew MacLeod
Good catch.  Probably not a common case as usually we're already in 
supported type contexts when we get around to check range_compatible..


I guess it wouldn't hurt to put a gcc_checking_assert in 
range_compatible_p to confirm that they are supported types before 
returning true.


Certainly ok.

thanks

Andrew

On 10/15/24 04:27, Richard Biener wrote:

The range folding code of COND_EXPRs missed a check whether the
comparison operand type is supported.

Bootstrap and regtest in progress on x86_64-unknown-linux-gnu.  I'll
push if that succeeds.  There might be other places missing such
a check, not sure.

Richard.

PR tree-optimization/117138
* gimple-range-fold.cc (fold_using_range::condexpr_adjust):
Check if the comparison operand type is supported.

* gcc.dg/torture/pr117138.c: New testcase.
---
  gcc/gimple-range-fold.cc|  3 ++-
  gcc/testsuite/gcc.dg/torture/pr117138.c | 13 +
  2 files changed, 15 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.dg/torture/pr117138.c

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 65d31adde54..dcd0cae0351 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -1139,7 +1139,8 @@ fold_using_range::condexpr_adjust (vrange &r1, vrange 
&r2, gimple *, tree cond,
|| TREE_CODE_CLASS (gimple_assign_rhs_code (cond_def)) != 
tcc_comparison)
  return false;
tree type = TREE_TYPE (gimple_assign_rhs1 (cond_def));
-  if (!range_compatible_p (type, TREE_TYPE (gimple_assign_rhs2 (cond_def
+  if (!value_range::supports_type_p (type)
+  || !range_compatible_p (type, TREE_TYPE (gimple_assign_rhs2 (cond_def
  return false;
range_op_handler hand (gimple_assign_rhs_code (cond_def));
if (!hand)
diff --git a/gcc/testsuite/gcc.dg/torture/pr117138.c 
b/gcc/testsuite/gcc.dg/torture/pr117138.c
new file mode 100644
index 000..b32585d3a56
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr117138.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-msse4" { target { x86_64-*-* i?86-*-* } } } */
+
+int a, b;
+_Complex long c;
+
+void
+foo ()
+{
+  do
+b = c || a;
+  while (a);
+}




Re: [PATCH] warning option for traps (-Wtrap)

2024-10-15 Thread Martin Uecker
Am Dienstag, dem 15.10.2024 um 12:15 +0200 schrieb Jakub Jelinek:
> On Tue, Oct 15, 2024 at 11:50:21AM +0200, Richard Biener wrote:
> > > Would it be reasonable to approve this patch now and I try
> > > to improve this later?
> > 
> > On the patch itself:
> > 
> >  void
> >  expand_builtin_trap (void)
> >  {
> > +  if (warn_trap)
> > +{
> > +  location_t current_location =
> 
> Formatting-wise, = shouldn't be at the end of line.
> 
> > +   linemap_unwind_to_first_non_reserved_loc (line_table, 
> > input_location,
> > + NULL);
> > +   warning_at (current_location, OPT_Wtrap, "trap generated");
> > +}
> > +
> >if (targetm.have_trap ())
> > 
> > this also diagnoses calls the user puts in by calling __builtin_trap (),
> > the documentation should probably mention this.  I see the only testcase
> > exercises only this path.  I have doubts -fsanitize-trap with any
> > sanitizer will ever yield a clean binary, so I wonder about practical
> > uses besides very small testcases?
> 
> Given that even simple
> int foo (int x, int y) { return x + y; }
> calls it for -fsanitize=undefined -fsanitize-trap=undefined, and more
> importantly, we try not to optimize away sanitizer checks based on VRP and
> other optimizations at least to some extent, because VRP and other
> optimizations optimize on UB not happening while sanitizers try to catch
> the UB, I have serious doubts about the warning.
> One would need completely different approach, where we try as much as
> possible to prove UB can't happen and only warn if we couldn't prove it
> can't.  Still, there would be tons of false positives where things just
> aren't inlined and we can't prove UB won't happen, or warnings on dead
> code...
> 

It would not be practical with some sanitizers (and sanitizers are also not
the only usecase anyhow). But attached below is the list of emitted warnings
for different sanitizers depending on the optimization level for
BART (https://github.com/mrirecon/bart).  The first number is optimization,
then the number of warnings, and then the number with different warnings
from the same source location (e.g. macros) combined.  The project has
>400 C source files.

For some sanitizers the number is rather low and e.g. shift is something
I was looking at specifically and where I already found this useful.  But
also for  some the others the numbers are low enough to investigate each case. 
For new code, an average number of warnings per file of 10 would also
still seem entirely reasonable, so I think one could eliminate all
integer overflow cases if one wanted to. But also doing this only for
criticial pieces of code (e.g. using #pragma GCC diagnostic ...) would
seem useful.

One also sees that the optimizer actually does remove a lot of traps. That
the total number sometimes goes up with -O2 is mostly likely due to code
duplication, but the unique cases always go down with more optimization.
(except for object-size which does not run fully for -O1 I believe)

I have to admit I did not understand your comment about VRP, but
in my experience we remove UB sanitization based on value ranges, e.g.
in this example for signed overflow.

https://godbolt.org/z/zeMvas3nq

Martin



shift 0 525 31
shift 1 298 19
shift 2 221 19
signed-integer-overflow 0 12212 11350
signed-integer-overflow 1 6310 5913
signed-integer-overflow 2 3803 3172
integer-divide-by-zero 0 332 332
integer-divide-by-zero 1 135 135
integer-divide-by-zero 2 95 95
unreachable 0 0 0
unreachable 1 0 0
unreachable 2 4 4
vla-bound 0 2671 2426
vla-bound 1 1206 1143
vla-bound 2 1024 906
null 0 32041 26739
null 1 8926 7666
null 2 8622 6769
return 0 0 0
return 1 0 0
return 2 4 4
bounds 0 5031 4830
bounds 1 4776 4551
bounds 2 4986 4547
bounds-strict 0 5031 4830
bounds-strict 1 4776 4551
bounds-strict 2 4986 4547
alignment 0 31818 26524
alignment 1 9848 8441
alignment 2 10031 7855
object-size 0 465 453
object-size 1 5928 5491
object-size 2 6013 5366
float-divide-by-zero 0 532 532
float-divide-by-zero 1 430 429
float-divide-by-zero 2 250 245
float-cast-overflow 0 107 98
float-cast-overflow 1 99 91
float-cast-overflow 2 65 59
nonnull-attribute 0 13433 7672
nonnull-attribute 1 5199 3466
nonnull-attribute 2 1085 1005
return-nonnull-attribute 0 0 0
return-nonnull-attribute 1 0 0
return-nonnull-attribute 2 0 0
bool 0 1053 1036
bool 1 988 972
bool 2 371 363
enum 0 0 0
enum 1 0 0
enum 2 4 4
pointer-overflow 0 29095 25298
pointer-overflow 1 17284 16254
pointer-overflow 2 18598 16110
builtin 0 0 0
builtin 1 0 0
builtin 2 4 4



[pushed: r15-4361] testsuite: simplify analyzer_cpython_plugin.c

2024-10-15 Thread David Malcolm
No functional change intended.

Successfully regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-4361-g77076d85e9aa5e.

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/analyzer_cpython_plugin.c: Use success_call_info
in a couple of places to avoid reimplementing get_desc.

Signed-off-by: David Malcolm 
---
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 22 ---
 1 file changed, 4 insertions(+), 18 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c 
b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
index d0fe110f20e9..c1510e441e6f 100644
--- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
+++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
@@ -963,17 +963,10 @@ public:
 void
 kf_PyList_New::impl_call_post (const call_details &cd) const
 {
-  class success : public call_info
+  class success : public success_call_info
   {
   public:
-success (const call_details &cd) : call_info (cd) {}
-
-label_text
-get_desc (bool can_colorize) const final override
-{
-  return make_label_text (can_colorize, "when %qE succeeds",
-  get_fndecl ());
-}
+success (const call_details &cd) : success_call_info (cd) {}
 
 bool
 update_model (region_model *model, const exploded_edge *,
@@ -1104,17 +1097,10 @@ public:
 void
 kf_PyLong_FromLong::impl_call_post (const call_details &cd) const
 {
-  class success : public call_info
+  class success : public success_call_info
   {
   public:
-success (const call_details &cd) : call_info (cd) {}
-
-label_text
-get_desc (bool can_colorize) const final override
-{
-  return make_label_text (can_colorize, "when %qE succeeds",
-  get_fndecl ());
-}
+success (const call_details &cd) : success_call_info (cd) {}
 
 bool
 update_model (region_model *model, const exploded_edge *,
-- 
2.26.3



[pushed: r15-4360] testsuite, jit: fix test-error-pr63969-missing-driver.c

2024-10-15 Thread David Malcolm
jit.dg/test-error-pr63969-missing-driver.c tries to break PATH and
verify that an error is generated when using an external driver.

However it does this by unsetting PATH, and so the test could
accidentally find the driver if the system supplies a default and the
driver happens to be installed in that path (reported as rhbz#2318021).

Fix the test by instead setting PATH to a bogus value.

Successfully regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-4360-gf8dcb559e615db.

gcc/testsuite/ChangeLog:
* jit.dg/test-error-pr63969-missing-driver.c (create_code): When
breaking PATH, use setenv with a bogus value, rather than
unsetenv, in case the system uses a default path that contains
the driver binary.

Signed-off-by: David Malcolm 
---
 gcc/testsuite/jit.dg/test-error-pr63969-missing-driver.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/jit.dg/test-error-pr63969-missing-driver.c 
b/gcc/testsuite/jit.dg/test-error-pr63969-missing-driver.c
index 733522310deb..152e236443cc 100644
--- a/gcc/testsuite/jit.dg/test-error-pr63969-missing-driver.c
+++ b/gcc/testsuite/jit.dg/test-error-pr63969-missing-driver.c
@@ -28,7 +28,7 @@ create_code (gcc_jit_context *ctxt, void *user_data)
   /* Break PATH, so that the driver can't be found
  by gcc::jit::playback::context::compile ()
  within gcc_jit_context_compile.  */
-  unsetenv ("PATH");
+  setenv ("PATH", "/this/is/not/a/valid/path", 1);
 }
 
 void
-- 
2.26.3



[PUSHED] C++: Add opindex for -Wchanges-meaning [PR117157]

2024-10-15 Thread Andrew Pinski
Adds missing opindex for -Wchanges-meaning

Pushed as obvious after building the HTML and checking the index.

gcc/ChangeLog:

PR c++/117157
* doc/invoke.texi (Wno-changes-meaning): Add opindex.

Signed-off-by: Andrew Pinski 
---
 gcc/doc/invoke.texi | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4f4ca637549..0db754c888a 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -6448,6 +6448,8 @@ union U @{
 
 @end itemize
 
+@opindex Wchanges-meaning
+@opindex Wno-changes-meaning
 @item -Wno-changes-meaning @r{(C++ and Objective-C++ only)}
 C++ requires that unqualified uses of a name within a class have the
 same meaning in the complete scope of the class, so declaring the name
-- 
2.43.0



Re: [PATCH v4] RISC-V: add option -m(no-)autovec-segment

2024-10-15 Thread Jeff Law




On 10/15/24 8:56 AM, Patrick O'Neill wrote:

From: Greg McGary 

Add option -m(no-)autovec-segment to enable/disable autovectorizer
from emitting vector segment load/store instructions. This is useful for
performance experiments.

gcc/ChangeLog:
* config/riscv/autovec.md (vec_mask_len_load_lanes, 
vec_mask_len_store_lanes):
  Predicate with TARGET_VECTOR_AUTOVEC_SEGMENT
* gcc/config/riscv/riscv-opts.h (TARGET_VECTOR_AUTOVEC_SEGMENT): New 
macro.
* gcc/config/riscv/riscv.opt (-m(no-)autovec-segment): New option.
* testsuite/gcc.target/riscv/rvv/autovec/struct/*_noseg*.c,
testsuite/gcc.target/riscv/rvv/autovec/no-segment.c: New tests.
---
Relying on CI for testing. Please wait for that testing to complete before
committing.
Quick question.  We did something like this to aid internal 
testing/bringup.  Our variant adjusted a ton of the mode iterators in 
vector-iterators.md and the TUPLE_ENTRY stuff in riscv-vector-switch.def.


Robin, do you remember why you had to adjust all the iterators?  Was it 
that LTO issue we had with the early variants, or something else?


jeff



[PUSHED] C++: Regenerate c.opt.urls [PR117157]

2024-10-15 Thread Andrew Pinski
I forgot to regenerate the c.opt.urls files after adding the opindex for 
changes-meaning.
Fixed thusly.

gcc/c-family/ChangeLog:

C++/117157
* c.opt.urls: Regenerate.

Signed-off-by: Andrew Pinski 
---
 gcc/c-family/c.opt.urls | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/c-family/c.opt.urls b/gcc/c-family/c.opt.urls
index c1738095e6d..d045af14c3f 100644
--- a/gcc/c-family/c.opt.urls
+++ b/gcc/c-family/c.opt.urls
@@ -220,6 +220,9 @@ 
UrlSuffix(gcc/C_002b_002b-Dialect-Options.html#index-Wcatch-value)
 Wcatch-value=
 UrlSuffix(gcc/C_002b_002b-Dialect-Options.html#index-Wcatch-value)
 
+Wchanges-meaning
+UrlSuffix(gcc/Warning-Options.html#index-Wchanges-meaning)
+
 Wchar-subscripts
 UrlSuffix(gcc/Warning-Options.html#index-Wchar-subscripts)
 
-- 
2.43.0



[committed] testsuite/i386: Require AVX2 effective target in pr107432-9.c

2024-10-15 Thread Uros Bizjak
x86-64-v3 requires AVX2 effective target and AVX2 specific avx2-check.h.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr107432-9.c: Require AVX2 effective target.
Include avx2-check.h instead of avx-check.h.  Define TEST to avx2_test.

Tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/testsuite/gcc.target/i386/pr107432-9.c 
b/gcc/testsuite/gcc.target/i386/pr107432-9.c
index 90426c030c4..861db17a5ff 100644
--- a/gcc/testsuite/gcc.target/i386/pr107432-9.c
+++ b/gcc/testsuite/gcc.target/i386/pr107432-9.c
@@ -1,11 +1,13 @@
 /* { dg-do run } */
 /* { dg-options "-march=x86-64-v3 -O2 -flax-vector-conversions" } */
+/* { dg-require-effective-target avx2 } */
+
 #include 
 
-#include "avx-check.h"
+#include "avx2-check.h"
 
 #ifndef TEST
-#define TEST avx_test
+#define TEST avx2_test
 #endif
 
 typedef short __v2hi __attribute__ ((__vector_size__ (4)));


Re: [PATCH] Introduce TARGET_FMV_ATTR_SEPARATOR

2024-10-15 Thread Andrew Carlotti
On Tue, Oct 15, 2024 at 02:18:43PM +0800, Yangyu Chen wrote:
> Some architectures may use ',' in the attribute string, but it is not
> used as the separator for different targets. To avoid conflict, we
> introduce a new macro TARGET_FMV_ATTR_SEPARATOR to separate different
> clones.

This is only for the target_clones attribute, so how about calling it
TARGET_CLONES_ATTR_SEPARATOR instead (or pluralised - see below)?

> As an example, according to RISC-V C-API Specification [1], RISC-V allows
> ',' in the attribute string in the "arch=" option to specify one more
> ISA extensions in the same target function, which conflict with the
> default separator to separate different clones. This patch introduces
> TARGET_FMV_ATTR_SEPARATOR for RISC-V and choose '#' as the separator,
> since '#' is not allowed in the target_clones option string.
> 
> [1] 
> https://github.com/riscv-non-isa/riscv-c-api-doc/blob/c6c5d6d9cf96b342293315a5dff3d25e96ef8191/src/c-api.adoc#__attribute__targetattr-string
> 
> gcc/ChangeLog:
> 
> * defaults.h (TARGET_FMV_ATTR_SEPARATOR): Define new macro.
> * multiple_target.cc (get_attr_str): Use
>   TARGET_FMV_ATTR_SEPARATOR to separate attributes.
> (separate_attrs): Likewise.
> * config/riscv/riscv.h (TARGET_FMV_ATTR_SEPARATOR): Define
>   TARGET_FMV_ATTR_SEPARATOR for RISC-V.
> ---
>  gcc/config/riscv/riscv.h |  5 +
>  gcc/defaults.h   |  4 
>  gcc/multiple_target.cc   | 19 ---
>  3 files changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index ca1b8329cdc..858cab72a4c 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -1298,4 +1298,9 @@ extern void riscv_remove_unneeded_save_restore_calls 
> (void);
>   STACK_BOUNDARY / BITS_PER_UNIT)\
>  : (crtl->outgoing_args_size + STACK_POINTER_OFFSET))
>  
> +/* According to the RISC-V C API, the arch string may contains ','. To avoid
> +   the conflict with the default separator, we choose '#' as the separator 
> for
> +   the target attribute.  */
> +#define TARGET_FMV_ATTR_SEPARATOR '#'
> +
>  #endif /* ! GCC_RISCV_H */
> diff --git a/gcc/defaults.h b/gcc/defaults.h
> index ac2d25852ab..f451efcb33e 100644
> --- a/gcc/defaults.h
> +++ b/gcc/defaults.h
> @@ -874,6 +874,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively. 
>  If not, see
>  #define TARGET_HAS_FMV_TARGET_ATTRIBUTE 1
>  #endif
>  
> +/* Select a attribute separator for function multiversioning.  */
> +#ifndef TARGET_FMV_ATTR_SEPARATOR
> +#define TARGET_FMV_ATTR_SEPARATOR ','
> +#endif
>  
>  /* Select a format to encode pointers in exception handling data.  We
> prefer those that result in fewer dynamic relocations.  Assume no
> diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc
> index 1fdd279da04..5a056b44571 100644
> --- a/gcc/multiple_target.cc
> +++ b/gcc/multiple_target.cc
> @@ -180,7 +180,7 @@ create_dispatcher_calls (struct cgraph_node *node)
>  }
>  }
>  
> -/* Create string with attributes separated by comma.
> +/* Create string with attributes separated by TARGET_FMV_ATTR_SEPARATOR.
> Return number of attributes.  */
>  
>  static int
> @@ -194,17 +194,21 @@ get_attr_str (tree arglist, char *attr_str)
>  {
>const char *str = TREE_STRING_POINTER (TREE_VALUE (arg));
>size_t len = strlen (str);
> -  for (const char *p = strchr (str, ','); p; p = strchr (p + 1, ','))
> +  for (const char *p = strchr (str, TARGET_FMV_ATTR_SEPARATOR);
> +p;
> +p = strchr (p + 1, TARGET_FMV_ATTR_SEPARATOR))
>   argnum++;
>memcpy (attr_str + str_len_sum, str, len);
> -  attr_str[str_len_sum + len] = TREE_CHAIN (arg) ? ',' : '\0';
> +  attr_str[str_len_sum + len]
> + = TREE_CHAIN (arg) ? TARGET_FMV_ATTR_SEPARATOR : '\0';
>str_len_sum += len + 1;
>argnum++;
>  }
>return argnum;
>  }
>  
> -/* Return number of attributes separated by comma and put them into ARGS.
> +/* Return number of attributes separated by TARGET_FMV_ATTR_SEPARATOR and put
> +   them into ARGS.
> If there is no DEFAULT attribute return -1.
> If there is an empty string in attribute return -2.
> If there are multiple DEFAULT attributes return -3.
> @@ -215,9 +219,10 @@ separate_attrs (char *attr_str, char **attrs, int 
> attrnum)
>  {
>int i = 0;
>int default_count = 0;
> +  char separator_str[] = {TARGET_FMV_ATTR_SEPARATOR, '\0'};

How about defining the macro as a string (and appending an S to the name - e.g.
TARGET_CLONES_ATTR_SEPARATORS)?
 
> -  for (char *attr = strtok (attr_str, ",");
> -   attr != NULL; attr = strtok (NULL, ","))
> +  for (char *attr = strtok (attr_str, separator_str);
> +   attr != NULL; attr = strtok (NULL, separator_str))
>  {
>if (strcmp (attr, "default") == 0)
>   {
> @@ -305,7 +310,7 @@ static bool
>  expand_target_clones (struct cgraph_node *n

RE: [PATCH 1/4]middle-end: support multi-step zero-extends using VEC_PERM_EXPR

2024-10-15 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, October 15, 2024 12:13 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd 
> Subject: Re: [PATCH 1/4]middle-end: support multi-step zero-extends using
> VEC_PERM_EXPR
> 
> On Tue, 15 Oct 2024, Tamar Christina wrote:
> 
> > Hi,
> >
> > Thanks for the look,
> >
> > The 10/15/2024 09:54, Richard Biener wrote:
> > > On Mon, 14 Oct 2024, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > This patch series adds support for a target to do a direct convertion 
> > > > for zero
> > > > extends using permutes.
> > > >
> > > > To do this it uses a target hook use_permute_for_promotio which must be
> > > > implemented by targets.  This hook is used to indicate:
> > > >
> > > >  1. can a target do this for the given modes.
> > >
> > > can_vec_perm_const_p?
> > >
> > > >  3. can the target convert between various vector modes with a
> VIEW_CONVERT.
> > >
> > > We have modes_tieable_p for this I think.
> > >
> >
> > Yes, though the reason I didn't use either of them was because they are 
> > reporting
> > a capability of the backend.  In which case the hook, which is already 
> > backend
> > specific already should answer these two.
> >
> > I initially had these checks there, but they didn't seem to add value, for
> > promotions the masks are only dependent on the input and output modes. So
> they really
> > don't change.
> >
> > When you have say a loop that does lots of conversions from say char to 
> > int, it
> seemed
> > like a waste to retest the same permute constants over and over again.
> >
> > I can add them back in if you prefer...
> >
> > > >  2. is it profitable for the target to do it.
> > >
> > > So you say the target can do both ways but both zip and tbl are
> > > permute instructions so I really fail to see the point and why
> > > the target itself doesn't choose to use tbl for unpack.
> > >
> > > Is the intent in the end to have VEC_PERM in the IL rather than
> > > VEC_UNPACK_* so it combines with other VEC_PERMs?
> > >
> >
> > Yes, and this happens quite often, e.g. load permutes or lane shuffles etc.
> > The reason for exposing them as VEC_PERM was to trigger further 
> > optimizations.
> >
> > If you remember the ticket about LOAD_LANES, with this optimization and an
> open
> > encoding of LOAD_LANES we stop using it in cases where theres a zero extend
> after
> > the LOAD_LANES, because then you're doing effectively two permutes and the
> LOAD_LANES
> > is no longer beneficial. There are other examples, load and replicate etc.
> >
> > > That said, I'm not against supporting VEC_PERM code gen from
> > > unsigned promotion but I don't see why we should do this when
> > > the target advertises VEC_UNPACK_* support or direct conversion
> > > support?
> > >
> > > Esp. with adding a "local" cost related hook which cannot take
> > > into accout context.
> > >
> >
> > To summarize a long story:
> >
> >   yes I open encode zero extends as permutes to allow further optimizations.
> One could convert
> >   vec_unpacks to convert optabs and use that, but that is an opague value 
> > that
> can't be further
> >   optimized.
> >
> >   The hook isn't really a costing thing in the general sense. It's 
> > literally just "do you
> want
> >   permutes yes or no".  The reason it gets the modes is simply that I don't 
> > think a
> single level
> >   extend is worth it, but I can just change it to never try to do this on 
> > more than
> one level.
> 
> When you mention LOAD_LANES we do not expose "permutes" in them on
> GIMPLE
> either, so why should we for VEC_UNPACK_*.

I think not exposing LOAD_LANES in GIMPLE *is* an actual mistake that I hope to 
correct in GCC-16.
Or at least the time we pick LOAD_LANES is too early.  So I don't think 
pointing to this is a convincing
argument.  It's only VLA that I think needs the IL because you have to mask the 
group of operations and
may be hard to reconcile that later on.

> At what level are the simplifications you see happening then?

Well, they are currently happening outside of the vectorizer passes itself,
more specifically in this case because VN runs match simplifications.

If the concern is that that's late I can lift it to a pattern I suppose.
I didn't use a pattern because similar changes in this area always just happened
at codegen.

> 
> I do realize we have two ways of expressing zero-extending widenings
> (also truncations btw) and that's always bad - so we could decide to
> _always_ use VEC_PERMs as the canonical representation because those
> combine more easily.  And either match VEC_PERMs back to vec_unpack
> at RTL expansion time or require targets to expose those as constant
> vec_perms as well.  There are targets like GCN where you can't do
> unpacking with permutes of course, so we can't do away with them
> (we could possibly force those targets to expose widening/truncation
> solely with [us]ext and trunc patterns of course).

Ok, so your objection is that you don

[PATCH 1/3] Remove SLP_INSTANCE_UNROLLING_FACTOR, compute VF in vect_make_slp_decision

2024-10-15 Thread Richard Biener
The following prepares us for SLP instances with a non-uniform number
of lanes.  We already have this with load permutation lowering, but
we managed to keep that within the constraints of the per SLP instance
computed VF based on its max_nunits (with a vector type fixed for
each node) and the instance group size which is the number of lanes
in the SLP instance root.  But in the case where arbitrary splitting
and merging SLP nodes at non-power-of-two lane boundaries is allowed
this simple calculation based on the outgoing group size falls apart.

The following, instead of computing a VF during SLP instance
discovery, computes it at vect_make_slp_decision time by walking
the SLP graph and looking at each SLP node in isolation.  We do
track max_nunits per node which could be a VF per node instead or
forgo with both completely (though for BB vectorization we need
to communicate a VF > 1 requirement upward, or compute that after
the fact).  In the end we'd like to delay vector type assignment
and only compute a minimum VF here, allowing vector types to
grow when the actual VF is bigger.

There's slight complication with permutes of externs / constants
as those get their vector type (and thus max_nunits) assigned late.
While we force them to have the same vector type as the result at
the moment their number of lanes can differ.  So those get handled
explicitly there right now to up the VF as needed - the alternative
is to fail vectorization, I have an addition to
vect_maybe_update_slp_op_vectype that would FAIL if the set
vector type isn't within the constraints of the VF.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

* tree-vectorizer.h (SLP_INSTANCE_UNROLLING_FACTOR): Remove.
(slp_instance::unrolling_factor): Likewise.
* tree-vect-slp.cc (vect_build_slp_instance): Do not set
SLP_INSTANCE_UNROLLING_FACTOR.  Remove then dead code.
Compute and set max_nunits from the RHS nodes merged.
(vect_update_slp_vf_for_node): New function.
(vect_make_slp_decision): Use vect_update_slp_vf_for_node
to compute VF recursively.
(vect_build_slp_store_interleaving): Get max_nunits and
properly set that on the permute nodes built.
(vect_analyze_slp): Do not set SLP_INSTANCE_UNROLLING_FACTOR.
---
 gcc/tree-vect-slp.cc  | 72 +--
 gcc/tree-vectorizer.h |  4 ---
 2 files changed, 55 insertions(+), 21 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 28acd9ad147..959468cad8a 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -3557,13 +3557,15 @@ vect_analyze_slp_instance (vec_info *vinfo,
 
 static slp_tree
 vect_build_slp_store_interleaving (vec &rhs_nodes,
-  vec &scalar_stmts)
+  vec &scalar_stmts,
+  poly_uint64 max_nunits)
 {
   unsigned int group_size = scalar_stmts.length ();
   slp_tree node = vect_create_new_slp_node (scalar_stmts,
SLP_TREE_CHILDREN
  (rhs_nodes[0]).length ());
   SLP_TREE_VECTYPE (node) = SLP_TREE_VECTYPE (rhs_nodes[0]);
+  node->max_nunits = max_nunits;
   for (unsigned l = 0;
l < SLP_TREE_CHILDREN (rhs_nodes[0]).length (); ++l)
 {
@@ -3573,6 +3575,7 @@ vect_build_slp_store_interleaving (vec 
&rhs_nodes,
   SLP_TREE_CHILDREN (node).quick_push (perm);
   SLP_TREE_LANE_PERMUTATION (perm).create (group_size);
   SLP_TREE_VECTYPE (perm) = SLP_TREE_VECTYPE (node);
+  perm->max_nunits = max_nunits;
   SLP_TREE_LANES (perm) = group_size;
   /* ???  We should set this NULL but that's not expected.  */
   SLP_TREE_REPRESENTATIVE (perm)
@@ -3628,6 +3631,7 @@ vect_build_slp_store_interleaving (vec 
&rhs_nodes,
  SLP_TREE_LANES (permab) = n;
  SLP_TREE_LANE_PERMUTATION (permab).create (n);
  SLP_TREE_VECTYPE (permab) = SLP_TREE_VECTYPE (perm);
+ permab->max_nunits = max_nunits;
  /* ???  Should be NULL but that's not expected.  */
  SLP_TREE_REPRESENTATIVE (permab) = SLP_TREE_REPRESENTATIVE (perm);
  SLP_TREE_CHILDREN (permab).quick_push (a);
@@ -3698,6 +3702,7 @@ vect_build_slp_store_interleaving (vec 
&rhs_nodes,
  SLP_TREE_LANES (permab) = n;
  SLP_TREE_LANE_PERMUTATION (permab).create (n);
  SLP_TREE_VECTYPE (permab) = SLP_TREE_VECTYPE (perm);
+ permab->max_nunits = max_nunits;
  /* ???  Should be NULL but that's not expected.  */
  SLP_TREE_REPRESENTATIVE (permab) = SLP_TREE_REPRESENTATIVE (perm);
  SLP_TREE_CHILDREN (permab).quick_push (a);
@@ -3828,7 +3833,6 @@ vect_build_slp_instance (vec_info *vinfo,
  /* Create a new SLP instance.  */
  slp_instance new_instance = XNEW (class _slp_instance);
  SLP_INSTANCE_TREE (new_instance) = node;
- SLP

Re: [PATCH 1/2] [Middle-end] Canonicalize (vec_merge (fma op2 op1 op3) op1 mask) to (vec_merge (fma op1 op2 op3) op1 mask).

2024-10-15 Thread Richard Biener
On Tue, Oct 15, 2024 at 5:30 AM liuhongt  wrote:
>
> For x86 masked fma, there're 2 rtl representations
> 1) (vec_merge (fma op2 op1 op3) op1 mask)
> 2) (vec_merge (fma op1 op2 op3) op1 mask).
>
>  5894(define_insn "_fmadd__mask"
>  5895  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v")
>  5896(vec_merge:VFH_AVX512VL
>  5897  (fma:VFH_AVX512VL
>  5898(match_operand:VFH_AVX512VL 1 "nonimmediate_operand" "0,0")
>  5899(match_operand:VFH_AVX512VL 2 "" 
> ",v")
>  5900(match_operand:VFH_AVX512VL 3 "" 
> "v,"))
>  5901  (match_dup 1)
>  5902  (match_operand: 4 "register_operand" 
> "Yk,Yk")))]
>  5903  "TARGET_AVX512F && "
>  5904  "@
>  5905   vfmadd132\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, 
> %2}
>  5906   vfmadd213\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, 
> %3}"
>  5907  [(set_attr "type" "ssemuladd")
>  5908   (set_attr "prefix" "evex")
>  5909   (set_attr "mode" "")])
>
> Here op1 has constraint "0", and the scecond op1 is (match_dup 1),
> we once tried to replace it with (match_operand:M 5
> "nonimmediate_operand" "0")) to enable more flexibility for pattern
> match and recog, but it triggered an ICE in reload(reload can handle
> at most one perand with "0" constraint).
>
> So we need either add 2 patterns in the backend or just do the
> canonicalization in the middle-end.

Nice spot to handle this.  OK with the minor not below fixed
and in case there are no further comments from CCed folks.

I'll note there's (vec_select (vec_concat ()) as alternate
way to perform a (vec_merge ...) but I don't feel strongly
for supporting that alternative without evidence it's used.
aarch64 seems to use an UNSPEC for masking but it
seems to have at least two patterns to merge with
either the first or the third input but failing to handle the
other (second) operand of a multiplication (*cond_fma_2 and _4);
as both are "register_operand" I don't see how canonicalization
works there?  Of course we can't do anything for UNSPECs.
RISC-V has mastered to obfuscate its machine description so I
have no idea ;)

Richard.

> gcc/ChangeLog:
>
> * combine.cc (maybe_swap_commutative_operands):
> Canonicalize (vec_merge (fma op2 op1 op3) op1 mask)
> to (vec_merge (fma op1 op2 op3) op1 mask).
> ---
>  gcc/combine.cc | 25 +
>  1 file changed, 25 insertions(+)
>
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index fef06a6cdc0..aa40fdcc50d 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -5656,6 +5656,31 @@ maybe_swap_commutative_operands (rtx x)
>SUBST (XEXP (x, 1), temp);
>  }
>
> +  /* Canonicalize (vec_merge (fma op2 op1 op3) op1 mask) to
> + (vec_merge (fma op1 op2 op3) op1 mask).  */
> +  if (GET_CODE (x) == VEC_MERGE
> +  && GET_CODE (XEXP (x, 0)) == FMA)
> +{
> +  rtx fma_op1 = XEXP (XEXP (x, 0), 0);
> +  rtx fma_op2 = XEXP (XEXP (x, 0), 1);
> +  rtx masked_op = XEXP (x, 1);
> +  if (rtx_equal_p (masked_op, fma_op2))
> +   {
> + if (GET_CODE (fma_op1) == NEG)
> +   {

please add a comment like

  /* Keep the negate canonicalized to the first operand.  */

> + fma_op1 = XEXP (fma_op1, 0);
> + SUBST (XEXP (XEXP (XEXP (x, 0), 0), 0), fma_op2);
> + SUBST (XEXP (XEXP (x, 0), 1), fma_op1);
> +   }
> + else
> +   {
> + SUBST (XEXP (XEXP (x, 0), 0), fma_op2);
> + SUBST (XEXP (XEXP (x, 0), 1), fma_op1);
> +   }
> +

stray vertical space

> +   }
> +}
> +
>unsigned n_elts = 0;
>if (GET_CODE (x) == VEC_MERGE
>&& CONST_INT_P (XEXP (x, 2))
> --
> 2.31.1
>


[PATCH 3/3] Avoid using SLP_TREE_LOAD_PERMUTATION for non-grouped SLP loads

2024-10-15 Thread Richard Biener
The following makes sure to use a VEC_PERM SLP node to produce
lane duplications for non-grouped SLP loads as those are later
not lowered by load permutation lowering.

For some reason gcc.dg/vect/pr106081.c now fails permute optimizing,
in particular eliding vector reversal for the reduction.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

* tree-vect-slp.cc (vect_build_slp_tree_2): Use a VEC_PERM
SLP node to duplicate lanes for non-grouped loads.

* gcc.dg/vect/pr106081.c: Adjust.
---
 gcc/testsuite/gcc.dg/vect/pr106081.c |  2 +-
 gcc/tree-vect-slp.cc | 38 +++-
 2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr106081.c 
b/gcc/testsuite/gcc.dg/vect/pr106081.c
index 8f97af2d642..1864320c803 100644
--- a/gcc/testsuite/gcc.dg/vect/pr106081.c
+++ b/gcc/testsuite/gcc.dg/vect/pr106081.c
@@ -30,4 +30,4 @@ test(double *k)
 }
 
 /* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
-/* { dg-final { scan-tree-dump-times "VEC_PERM" 4 "optimized" { target 
x86_64-*-* i?86-*-* } } } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM" 5 "optimized" { target 
x86_64-*-* i?86-*-* } } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index af00c5e35dd..b34064103bd 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -2088,7 +2088,43 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
}
  else
{
- SLP_TREE_LOAD_PERMUTATION (node) = load_permutation;
+ if (!STMT_VINFO_GROUPED_ACCESS (stmt_info))
+   {
+ /* Do not use SLP_TREE_LOAD_PERMUTATION for non-grouped
+accesses.  Instead when duplicated to so via a
+VEC_PERM node.  */
+ if (!any_permute)
+   load_permutation.release ();
+ else
+   {
+ gcc_assert (group_size != 1);
+ vec stmts2;
+ stmts2.create (1);
+ stmts2.quick_push (stmt_info);
+ bool matches2;
+ slp_tree unperm_load
+   = vect_build_slp_tree (vinfo, stmts2, 1,
+  &this_max_nunits, &matches2,
+  limit, &this_tree_size, bst_map);
+ gcc_assert (unperm_load);
+ lane_permutation_t lperm;
+ lperm.create (group_size);
+ for (unsigned j = 0; j < load_permutation.length (); ++j)
+   {
+ gcc_assert (load_permutation[j] == 0);
+ lperm.quick_push (std::make_pair (0, 0));
+   }
+ SLP_TREE_CODE (node) = VEC_PERM_EXPR;
+ SLP_TREE_CHILDREN (node).safe_push (unperm_load);
+ SLP_TREE_LANE_PERMUTATION (node) = lperm;
+ load_permutation.release ();
+ *max_nunits = this_max_nunits;
+ (*tree_size)++;
+ return node;
+   }
+   }
+ else
+   SLP_TREE_LOAD_PERMUTATION (node) = load_permutation;
  return node;
}
}
-- 
2.43.0


[PATCH 2/3] tree-optimization/117050 - fix ICE with non-grouped .MASK_LOAD SLP

2024-10-15 Thread Richard Biener
The following is a more complete fix for PR117050, restoring the
ability to permute non-grouped .MASK_LOAD with.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

PR tree-optimization/117050
* tree-vect-slp.cc (vect_build_slp_tree_2): Properly handle
non-grouped masked loads when handling permutations.
---
 gcc/tree-vect-slp.cc | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 959468cad8a..af00c5e35dd 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1991,7 +1991,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
  stmt_vec_info load_info;
  load_permutation.create (group_size);
  stmt_vec_info first_stmt_info
-   = DR_GROUP_FIRST_ELEMENT (SLP_TREE_SCALAR_STMTS (node)[0]);
+   = STMT_VINFO_GROUPED_ACCESS (stmt_info)
+ ? DR_GROUP_FIRST_ELEMENT (stmt_info) : stmt_info;
  bool any_permute = false;
  bool any_null = false;
  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), j, load_info)
@@ -2035,8 +2036,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
 loads with gaps.  */
  if ((STMT_VINFO_GROUPED_ACCESS (stmt_info)
   && (DR_GROUP_GAP (first_stmt_info) != 0 || has_gaps))
- || STMT_VINFO_STRIDED_P (stmt_info)
- || (!STMT_VINFO_GROUPED_ACCESS (stmt_info) && any_permute))
+ || STMT_VINFO_STRIDED_P (stmt_info))
{
  load_permutation.release ();
  matches[0] = false;
@@ -2051,17 +2051,17 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
{
  /* Discover the whole unpermuted load.  */
  vec stmts2;
- stmts2.create (DR_GROUP_SIZE (first_stmt_info));
- stmts2.quick_grow_cleared (DR_GROUP_SIZE (first_stmt_info));
+ unsigned dr_group_size = STMT_VINFO_GROUPED_ACCESS (stmt_info)
+ ? DR_GROUP_SIZE (first_stmt_info) : 1;
+ stmts2.create (dr_group_size);
+ stmts2.quick_grow_cleared (dr_group_size);
  unsigned i = 0;
  for (stmt_vec_info si = first_stmt_info;
   si; si = DR_GROUP_NEXT_ELEMENT (si))
stmts2[i++] = si;
- bool *matches2
-   = XALLOCAVEC (bool, DR_GROUP_SIZE (first_stmt_info));
+ bool *matches2 = XALLOCAVEC (bool, dr_group_size);
  slp_tree unperm_load
-   = vect_build_slp_tree (vinfo, stmts2,
-  DR_GROUP_SIZE (first_stmt_info),
+   = vect_build_slp_tree (vinfo, stmts2, dr_group_size,
   &this_max_nunits, matches2, limit,
   &this_tree_size, bst_map);
  /* When we are able to do the full masked load emit that
-- 
2.43.0



Re: [PATCH v2] alpha: Add -mlra option

2024-10-15 Thread John Paul Adrian Glaubitz
Hi Maciej,

On Tue, 2024-10-15 at 13:36 +0100, Maciej W. Rozycki wrote:
> > IMO, we should simply deprecate non-BWX targets. If reload is going
> > away, then there is no way for non-BWX targets to access reload
> > internals they require for compilation. As mentioned in the PR,
> > non-BWX targets are removed from distros anyway, so I guess there is
> > no point to invest much time to modernize them,
> 
>  Well, I have a lasting desire to keep non-BWX Alphas running, under Linux 
> in particular, and I'm going to look into any issues around it; reload vs 
> LRA is all software, so things can always be sorted one way or another.
> 
>  While I've been distracted by other matters lately, such as hardware 
> failures that had to be dealt with urgently, this is now my priority #1 
> and I do hope to have at least some critical initial stuff in with this 
> release cycle (noting that only ~5 weeks have left).

That's great.

While I'm not really an expert for compiler development, I have beefy hardware
available for GCC and kernel build tests, so if you have any patches for 
testing,
please let me know.

>  NB I spoke to Richard about it while at LPC 2024 recently.

OK, good.

FWIW, it *seems* that LRA seems to just work with EV56 as the baseline and the
following replacements in the code:

s/reload_in_progress/reload_in_progress || lra_in_progress/g

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


Re: [PATCH 4/4]middle-end: create the longest possible zero extend chain after overwidening

2024-10-15 Thread Richard Biener
On Mon, 14 Oct 2024, Tamar Christina wrote:

> Hi All,
> 
> Consider loops such as:
> 
> void test9(unsigned char *x, long long *y, int n, unsigned char k) {
> for(int i = 0; i < n; i++) {
> y[i] = k + x[i];
> }
> }
> 
> where today we generate:
> 
> .L5:
> ldr q29, [x5], 16
> add x4, x4, 128
> uaddl   v1.8h, v29.8b, v30.8b
> uaddl2  v29.8h, v29.16b, v30.16b
> zip1v2.8h, v1.8h, v31.8h
> zip1v0.8h, v29.8h, v31.8h
> zip2v1.8h, v1.8h, v31.8h
> zip2v29.8h, v29.8h, v31.8h
> sxtlv25.2d, v2.2s
> sxtlv28.2d, v0.2s
> sxtlv27.2d, v1.2s
> sxtlv26.2d, v29.2s
> sxtl2   v2.2d, v2.4s
> sxtl2   v0.2d, v0.4s
> sxtl2   v1.2d, v1.4s
> sxtl2   v29.2d, v29.4s
> stp q25, q2, [x4, -128]
> stp q27, q1, [x4, -96]
> stp q28, q0, [x4, -64]
> stp q26, q29, [x4, -32]
> cmp x5, x6
> bne .L5
> 
> Note how the zero extend from short to long is half way the chain transformed
> into a sign extend.  There are two problems with this:
> 
>   1. sign extends are typically slower than zero extends on many uArches.
>   2. it prevents vectorizable_conversion from attempting to do a single step
>  promotion.
> 
> These sign extend happen due to the varous range reduction optimizations and
> patterns we have, such as multiplication widening, etc.
> 
> My first attempt to fix this was just updating the patterns to when the 
> original
> source is a zero extend, to not add the intermediate sign extend.
> 
> However this behavior happens in many other places, some of it and as new
> patterns get added the problem can be re-introduced.
> 
> Instead I have added a new pattern vect_recog_zero_extend_chain_pattern that
> attempts to simplify and extend an existing zero extend over multiple
> conversions statements.
> 
> As an example, T3 a = (T3)(signed T2)(unsigned T1)x where bitsize T3 > T2 > T1
> gets transformed into T3 a = (T3)(signed T2)(unsigned T2)x.
> 
> The final cast to signed it kept so the types in the tree still match. It will
> be correctly elided later on.
> 
> This represenation is the most optimal as vectorizable_conversion is already
> able to decompose a long promotion into multiple steps if the target does not
> support it in a single step.  More importantly it allows us to do proper 
> costing
> and support such conversions like (double)x, where bitsize(x) < int in an
> efficient manner.
> 
> To do this I have used Ranger's on-demand analysis to perform the check to see
> if an extension can be removed and extended to zero extend.  The reason for 
> this
> is that the vectorizer introduces several patterns that are not in the IL,  
> but
> also lots of widening IFNs for which handling in a switch wouldn't be very
> future proof.
> 
> I did try to do it without Ranger, but ranger had two benefits:
> 
> 1.  It simplified the handling of the IL changes the vectorizer introduces, 
> and
> makes it future proof.
> 2.  Ranger has the advantage of doing the transformation in cases where it 
> knows
> that the top bits of the value is zero.  Which we wouldn't be able to tell
> by looking purely at statements.
> 3.  Ranger simplified the handling of corner cases.  Without it the handling 
> was
> quite complex and I wasn't very confident in it's correctness.
> 
> So I think ranger is the right way to go here...  With these changes the above
> now generates:
> 
> .L5:
> add x4, x4, 128
> ldr q26, [x5], 16
> uaddl   v2.8h, v26.8b, v31.8b
> uaddl2  v26.8h, v26.16b, v31.16b
> tbl v4.16b, {v2.16b}, v30.16b
> tbl v3.16b, {v2.16b}, v29.16b
> tbl v24.16b, {v2.16b}, v28.16b
> tbl v1.16b, {v26.16b}, v30.16b
> tbl v0.16b, {v26.16b}, v29.16b
> tbl v25.16b, {v26.16b}, v28.16b
> tbl v2.16b, {v2.16b}, v27.16b
> tbl v26.16b, {v26.16b}, v27.16b
> stp q4, q3, [x4, -128]
> stp q1, q0, [x4, -64]
> stp q24, q2, [x4, -96]
> stp q25, q26, [x4, -32]
> cmp x5, x6
> bne .L5
> 
> I have also seen similar improvements in codegen on Arm and x86_64, especially
> with AVX512.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf,
> x86_64-pc-linux-gnu -m32, -m64 and no issues.
> 
> Hopefully Ok for master?

Hohumm.  So I looked at one of the examples and I don't see any
sign-extends in the IL we vectorize.  So your pattern is about
changing int -> double to unsigned int -> double but only so
a required intermediate int -> long conversion is done as
zero-extend?  IMO this doesn't belong to patterns but to
vectorizable_conversion, specifically the step determining the
intermediate types.

I don't quite understand what scalar pattern IL you feed to
the vectorizer in the end, 

RE: [PATCH 1/4]middle-end: support multi-step zero-extends using VEC_PERM_EXPR

2024-10-15 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, October 15, 2024 1:20 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd 
> Subject: RE: [PATCH 1/4]middle-end: support multi-step zero-extends using
> VEC_PERM_EXPR
> 
> On Tue, 15 Oct 2024, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Tuesday, October 15, 2024 12:13 PM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd 
> > > Subject: Re: [PATCH 1/4]middle-end: support multi-step zero-extends using
> > > VEC_PERM_EXPR
> > >
> > > On Tue, 15 Oct 2024, Tamar Christina wrote:
> > >
> > > > Hi,
> > > >
> > > > Thanks for the look,
> > > >
> > > > The 10/15/2024 09:54, Richard Biener wrote:
> > > > > On Mon, 14 Oct 2024, Tamar Christina wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > This patch series adds support for a target to do a direct 
> > > > > > convertion for
> zero
> > > > > > extends using permutes.
> > > > > >
> > > > > > To do this it uses a target hook use_permute_for_promotio which 
> > > > > > must be
> > > > > > implemented by targets.  This hook is used to indicate:
> > > > > >
> > > > > >  1. can a target do this for the given modes.
> > > > >
> > > > > can_vec_perm_const_p?
> > > > >
> > > > > >  3. can the target convert between various vector modes with a
> > > VIEW_CONVERT.
> > > > >
> > > > > We have modes_tieable_p for this I think.
> > > > >
> > > >
> > > > Yes, though the reason I didn't use either of them was because they are
> reporting
> > > > a capability of the backend.  In which case the hook, which is already 
> > > > backend
> > > > specific already should answer these two.
> > > >
> > > > I initially had these checks there, but they didn't seem to add value, 
> > > > for
> > > > promotions the masks are only dependent on the input and output modes.
> So
> > > they really
> > > > don't change.
> > > >
> > > > When you have say a loop that does lots of conversions from say char to 
> > > > int,
> it
> > > seemed
> > > > like a waste to retest the same permute constants over and over again.
> > > >
> > > > I can add them back in if you prefer...
> > > >
> > > > > >  2. is it profitable for the target to do it.
> > > > >
> > > > > So you say the target can do both ways but both zip and tbl are
> > > > > permute instructions so I really fail to see the point and why
> > > > > the target itself doesn't choose to use tbl for unpack.
> > > > >
> > > > > Is the intent in the end to have VEC_PERM in the IL rather than
> > > > > VEC_UNPACK_* so it combines with other VEC_PERMs?
> > > > >
> > > >
> > > > Yes, and this happens quite often, e.g. load permutes or lane shuffles 
> > > > etc.
> > > > The reason for exposing them as VEC_PERM was to trigger further
> optimizations.
> > > >
> > > > If you remember the ticket about LOAD_LANES, with this optimization and 
> > > > an
> > > open
> > > > encoding of LOAD_LANES we stop using it in cases where theres a zero 
> > > > extend
> > > after
> > > > the LOAD_LANES, because then you're doing effectively two permutes and
> the
> > > LOAD_LANES
> > > > is no longer beneficial. There are other examples, load and replicate 
> > > > etc.
> > > >
> > > > > That said, I'm not against supporting VEC_PERM code gen from
> > > > > unsigned promotion but I don't see why we should do this when
> > > > > the target advertises VEC_UNPACK_* support or direct conversion
> > > > > support?
> > > > >
> > > > > Esp. with adding a "local" cost related hook which cannot take
> > > > > into accout context.
> > > > >
> > > >
> > > > To summarize a long story:
> > > >
> > > >   yes I open encode zero extends as permutes to allow further 
> > > > optimizations.
> > > One could convert
> > > >   vec_unpacks to convert optabs and use that, but that is an opague 
> > > > value
> that
> > > can't be further
> > > >   optimized.
> > > >
> > > >   The hook isn't really a costing thing in the general sense. It's 
> > > > literally just "do
> you
> > > want
> > > >   permutes yes or no".  The reason it gets the modes is simply that I 
> > > > don't
> think a
> > > single level
> > > >   extend is worth it, but I can just change it to never try to do this 
> > > > on more
> than
> > > one level.
> > >
> > > When you mention LOAD_LANES we do not expose "permutes" in them on
> > > GIMPLE
> > > either, so why should we for VEC_UNPACK_*.
> >
> > I think not exposing LOAD_LANES in GIMPLE *is* an actual mistake that I 
> > hope to
> correct in GCC-16.
> > Or at least the time we pick LOAD_LANES is too early.  So I don't think 
> > pointing to
> this is a convincing
> > argument.  It's only VLA that I think needs the IL because you have to mask 
> > the
> group of operations and
> > may be hard to reconcile that later on.
> >
> > > At what level are the simplifications you see happening then?
> >
> > Well, they are currently happening outside of the vectorizer passes itself,
> > more specifically in this case because VN runs

RE: [PATCH 2/4]middle-end: Fix VEC_PERM_EXPR lowering since relaxation of vector sizes

2024-10-15 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, October 15, 2024 1:22 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd 
> Subject: Re: [PATCH 2/4]middle-end: Fix VEC_PERM_EXPR lowering since
> relaxation of vector sizes
> 
> On Mon, 14 Oct 2024, Tamar Christina wrote:
> 
> > Hi All,
> >
> > In GCC 14 VEC_PERM_EXPR was relaxed to be able to permute to a 2x larger
> vector
> > than the size of the input vectors.  However various passes and 
> > transformations
> > were not updated to account for this.
> >
> > I have patches in these area that I will be upstreaming with individual 
> > patches
> > that expose them.
> >
> > This one is that vectlower tries to lower based on the size of the input 
> > vectors
> > rather than the size of the output.  As a consequence it creates an invalid
> > vector of half the size.
> >
> > Luckily we ICE because the resulting nunits doesn't match the vector size.
> >
> > Tests in the AArch64 patch test for this behaviour.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf,
> > x86_64-pc-linux-gnu -m32, -m64 and no issues.
> >
> > Ok for master?
> 
> OK.
> 
> Do you have a testcase btw?

It was relying on the auto-vect of the zero extends as TBLs.
But I'll create one using the gimple front-end.

Today __shufflevector zero pads permutes so you can't generate
this yet.. (in my next patch series).

I'll write a gimple one and commit with this then.

Thanks,
Tamar
> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * tree-vect-generic.cc (lower_vec_perm): Use output vector size instead
> > of input vector when determining output nunits.
> >
> > ---
> > diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
> > index
> 3041fb8fcf235ba86f37ef73aa089330a2fd0b77..f86f7eabb255fde50b30fa3b85
> db367df930f321 100644
> > --- a/gcc/tree-vect-generic.cc
> > +++ b/gcc/tree-vect-generic.cc
> > @@ -1500,6 +1500,7 @@ lower_vec_perm (gimple_stmt_iterator *gsi)
> >tree mask = gimple_assign_rhs3 (stmt);
> >tree vec0 = gimple_assign_rhs1 (stmt);
> >tree vec1 = gimple_assign_rhs2 (stmt);
> > +  tree res_vect_type = TREE_TYPE (gimple_assign_lhs (stmt));
> >tree vect_type = TREE_TYPE (vec0);
> >tree mask_type = TREE_TYPE (mask);
> >tree vect_elt_type = TREE_TYPE (vect_type);
> > @@ -1512,7 +1513,7 @@ lower_vec_perm (gimple_stmt_iterator *gsi)
> >location_t loc = gimple_location (gsi_stmt (*gsi));
> >unsigned i;
> >
> > -  if (!TYPE_VECTOR_SUBPARTS (vect_type).is_constant (&elements))
> > +  if (!TYPE_VECTOR_SUBPARTS (res_vect_type).is_constant (&elements))
> >  return;
> >
> >if (TREE_CODE (mask) == SSA_NAME)
> > @@ -1672,9 +1673,9 @@ lower_vec_perm (gimple_stmt_iterator *gsi)
> >  }
> >
> >if (constant_p)
> > -constr = build_vector_from_ctor (vect_type, v);
> > +constr = build_vector_from_ctor (res_vect_type, v);
> >else
> > -constr = build_constructor (vect_type, v);
> > +constr = build_constructor (res_vect_type, v);
> >gimple_assign_set_rhs_from_tree (gsi, constr);
> >update_stmt (gsi_stmt (*gsi));
> >  }
> >
> >
> >
> >
> >
> 
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH v2] alpha: Add -mlra option

2024-10-15 Thread Richard Biener
On Tue, Oct 15, 2024 at 2:41 PM John Paul Adrian Glaubitz
 wrote:
>
> Hi Maciej,
>
> On Tue, 2024-10-15 at 13:36 +0100, Maciej W. Rozycki wrote:
> > > IMO, we should simply deprecate non-BWX targets. If reload is going
> > > away, then there is no way for non-BWX targets to access reload
> > > internals they require for compilation. As mentioned in the PR,
> > > non-BWX targets are removed from distros anyway, so I guess there is
> > > no point to invest much time to modernize them,
> >
> >  Well, I have a lasting desire to keep non-BWX Alphas running, under Linux
> > in particular, and I'm going to look into any issues around it; reload vs
> > LRA is all software, so things can always be sorted one way or another.
> >
> >  While I've been distracted by other matters lately, such as hardware
> > failures that had to be dealt with urgently, this is now my priority #1
> > and I do hope to have at least some critical initial stuff in with this
> > release cycle (noting that only ~5 weeks have left).
>
> That's great.
>
> While I'm not really an expert for compiler development, I have beefy hardware
> available for GCC and kernel build tests, so if you have any patches for 
> testing,
> please let me know.
>
> >  NB I spoke to Richard about it while at LPC 2024 recently.
>
> OK, good.
>
> FWIW, it *seems* that LRA seems to just work with EV56 as the baseline and the
> following replacements in the code:
>
> s/reload_in_progress/reload_in_progress || lra_in_progress/g

If you can provide -mlra vs. -mno-lra testsuite results as well that
would be interesting.

Does "just work" mean you can build the compiler and its target
libraries?  In this case
I would suggest to go further and pull the trigger now, defaulting to
LRA but allowing
to switch back to reload for testing.  This is so the few people
testing alpha at all can
increase testing coverage - I don't think anybody runs older than EV5 HW.

Is VMS on alpha still a thing btw?  I still see it mentioned in config.gcc

Richard.

> Adrian
>
> --
>  .''`.  John Paul Adrian Glaubitz
> : :' :  Debian Developer
> `. `'   Physicist
>   `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


Re: [PATCH] dce: Remove FIXME that has not been true for years

2024-10-15 Thread Richard Biener
On Tue, Oct 15, 2024 at 4:07 AM Andrew Pinski  wrote:
>
> This FIXME:
>FIXME: Aggressive mode before PRE doesn't work currently because
>   the dominance info is not invalidated after DCE1.
>
> Has not been true since at least r0-104723-g5ac60b564faa85 which
> added a call to calculate_dominance_info. Plus we run agressive mode
> before PRE since r0-89162-g11b08ee9118d10 too. And since 
> r0-95499-gb5b8b0ac643d31,
> dominance information was required even for non-agressive mode.
>
> Also we have been verifying dominance information is correct and not needing 
> to invalidate
> since ssa branch was merged so this comment has been out of date even before 
> it was merged in.

OK.

> gcc/ChangeLog:
>
> * tree-ssa-dce.cc (perform_tree_ssa_dce): Remove FIXME note.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/tree-ssa-dce.cc | 8 +---
>  1 file changed, 1 insertion(+), 7 deletions(-)
>
> diff --git a/gcc/tree-ssa-dce.cc b/gcc/tree-ssa-dce.cc
> index 3075459e25f..015c17984e1 100644
> --- a/gcc/tree-ssa-dce.cc
> +++ b/gcc/tree-ssa-dce.cc
> @@ -1965,13 +1965,7 @@ make_forwarders_with_degenerate_phis (function *fn)
> In conservative mode, we ignore control dependence and simply declare
> all but the most trivially dead branches necessary.  This mode is fast.
> In aggressive mode, control dependences are taken into account, which
> -   results in more dead code elimination, but at the cost of some time.
> -
> -   FIXME: Aggressive mode before PRE doesn't work currently because
> - the dominance info is not invalidated after DCE1.  This is
> - not an issue right now because we only run aggressive DCE
> - as the last tree SSA pass, but keep this in mind when you
> - start experimenting with pass ordering.  */
> +   results in more dead code elimination, but at the cost of some time.  */
>
>  static unsigned int
>  perform_tree_ssa_dce (bool aggressive)
> --
> 2.43.0
>


[PATCH] RISC-V: Use biggest_mode as mode for constants.

2024-10-15 Thread Robin Dapp
Hi,

in compute_nregs_for_mode we expect that the current variable's mode is
at most as large as the biggest mode to be used for vectorization.

This might not be true for constants as they don't actually have a mode.
In that case, just use the biggest mode so max_number_of_live_regs
returns 1.

This fixes several test cases in the test suite.

Regtested on rv64gcv and letting the CI work out the rest.

Regards
 Robin

gcc/ChangeLog:

PR target/116655

* config/riscv/riscv-vector-costs.cc (max_number_of_live_regs):
Use biggest mode instead of constant's saved mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr116655.c: New test.
---
 gcc/config/riscv/riscv-vector-costs.cc | 14 ++
 .../gcc.target/riscv/rvv/autovec/pr116655.c| 11 +++
 2 files changed, 21 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116655.c

diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index 25570bd4004..67b9e3e8f41 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -194,7 +194,7 @@ compute_local_program_points (
   /* Collect the stmts that is vectorized and mark their program point.  */
   for (i = 0; i < nbbs; i++)
{
- int point = 1;
+ unsigned int point = 1;
  basic_block bb = bbs[i];
  vec program_points = vNULL;
  if (dump_enabled_p ())
@@ -489,9 +489,15 @@ max_number_of_live_regs (loop_vec_info loop_vinfo, const 
basic_block bb,
   pair live_range = (*iter).second;
   for (i = live_range.first + 1; i <= live_range.second; i++)
{
- machine_mode mode = TREE_CODE (TREE_TYPE (var)) == BOOLEAN_TYPE
-   ? BImode
-   : TYPE_MODE (TREE_TYPE (var));
+ machine_mode mode;
+ if (TREE_CODE (TREE_TYPE (var)) == BOOLEAN_TYPE)
+   mode = BImode;
+ /* Constants do not have a mode, just use the biggest so
+compute_nregs will return 1.  */
+ else if (TREE_CODE (var) == INTEGER_CST)
+   mode = biggest_mode;
+ else
+   mode = TYPE_MODE (TREE_TYPE (var));
  unsigned int nregs
= compute_nregs_for_mode (loop_vinfo, mode, biggest_mode, lmul);
  live_vars_vec[i] += nregs;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116655.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116655.c
new file mode 100644
index 000..36768e37d00
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116655.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=rv64imv -mabi=lp64d -mrvv-max-lmul=dynamic" } */
+
+short a[5];
+int b() {
+  int c = 0;
+  for (; c <= 4; c++)
+if (a[c])
+  break;
+  return c;
+}
-- 
2.46.2


Re: [PATCH v2] passes: Remove limit on the number of params

2024-10-15 Thread Richard Biener
On Mon, Oct 14, 2024 at 8:00 PM Andrew Pinski  wrote:
>
> Having a limit of 2 params for NEXT_PASS was just done because I didn't think 
> there was
> a way to handle arbitrary number of params. But I found that we can handle 
> this
> via a static const variable array (constexpr so we know it is true or false 
> at compile time)
> and just loop over the array.
>
> Note I keep around NEXT_PASS_WITH_ARG and NEXT_PASS macros instead of always 
> using
> NEXT_PASS_WITH_ARGS macro to make sure these cases get optimized for -O0 
> (stage1).
>
> Tested INSERT_PASS_AFTER/INSERT_PASS_BEFORE manually by changing 
> config/i386/i386-passes.def's
> stv lines to have a 2nd argument and checked the resuling pass-instances.def 
> to see the NEXT_PASS_WITH_ARGS
> was correctly done.
>
> changes from v1:
> * v2: Handle INSERT_PASS_AFTER/INSERT_PASS_BEFORE too.
>
> Bootstrapped and tested on x86_64-linux-gnu.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * gen-pass-instances.awk: Remove the limit of the params.
> * pass_manager.h (NEXT_PASS_WITH_ARG2): Rename to ...
> (NEXT_PASS_WITH_ARGS): This.
> * passes.cc (NEXT_PASS_WITH_ARG2): Rename to ...
> (NEXT_PASS_WITH_ARGS): This and support more than 2 params by using
> a constexpr array.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/gen-pass-instances.awk | 22 ++
>  gcc/pass_manager.h |  2 +-
>  gcc/passes.cc  | 13 +
>  3 files changed, 20 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/gen-pass-instances.awk b/gcc/gen-pass-instances.awk
> index def09347765..1e5b3f0c8cc 100644
> --- a/gcc/gen-pass-instances.awk
> +++ b/gcc/gen-pass-instances.awk
> @@ -100,7 +100,7 @@ function adjust_linenos(above, increment,   p, i)
>lineno += increment;
>  }
>
> -function insert_remove_pass(line, fnname,  arg3)
> +function insert_remove_pass(line, fnname,  arg3, i)
>  {
>parse_line($0, fnname);
>pass_name = args[1];
> @@ -110,8 +110,13 @@ function insert_remove_pass(line, fnname,  arg3)
>arg3 = args[3];
>sub(/^[ \t]*/, "", arg3);
>new_line = prefix "NEXT_PASS (" arg3;
> -  if (args[4])
> -new_line = new_line "," args[4];
> +  # Add the optional params back.
> +  i = 4;
> +  while (args[i])
> +{
> +  new_line = new_line "," args[i];
> +  i++;
> +}
>new_line = new_line ")" postfix;
>if (!pass_lines[pass_name, pass_num])
>  {
> @@ -195,7 +200,6 @@ function replace_pass(line, fnname, num, 
> i)
>  }
>
>  END {
> -  max_number_args = 2;
>for (i = 1; i < lineno; i++)
>  {
>ret = parse_line(lines[i], "NEXT_PASS");
> @@ -220,13 +224,8 @@ END {
>   if (num_args > 0)
> {
>   printf "NEXT_PASS_WITH_ARG";
> - if (num_args > max_number_args)
> -   {
> - print "ERROR: Only supports up to " max_number_args " args 
> to NEXT_PASS";
> - exit 1;
> -   }
>   if (num_args != 1)
> -   printf num_args;
> +   printf "S";
> }
>   else
> printf "NEXT_PASS";
> @@ -266,8 +265,7 @@ END {
>print "#undef POP_INSERT_PASSES"
>print "#undef NEXT_PASS"
>print "#undef NEXT_PASS_WITH_ARG"
> -  for (i = 2; i <= max_number_args; i++)
> -print "#undef NEXT_PASS_WITH_ARG" i
> +  print "#undef NEXT_PASS_WITH_ARGS"
>print "#undef TERMINATE_PASS_LIST"
>  }
>
> diff --git a/gcc/pass_manager.h b/gcc/pass_manager.h
> index f18ae026257..294cdd0b1f7 100644
> --- a/gcc/pass_manager.h
> +++ b/gcc/pass_manager.h
> @@ -130,7 +130,7 @@ private:
>  #define POP_INSERT_PASSES()
>  #define NEXT_PASS(PASS, NUM) opt_pass *PASS ## _ ## NUM
>  #define NEXT_PASS_WITH_ARG(PASS, NUM, ARG) NEXT_PASS (PASS, NUM)
> -#define NEXT_PASS_WITH_ARG2(PASS, NUM, ARG0, ARG1) NEXT_PASS (PASS, NUM)
> +#define NEXT_PASS_WITH_ARGS(PASS, NUM, ...) NEXT_PASS (PASS, NUM)
>  #define TERMINATE_PASS_LIST(PASS)
>
>  #include "pass-instances.def"
> diff --git a/gcc/passes.cc b/gcc/passes.cc
> index b5475fce522..ae80f40b96a 100644
> --- a/gcc/passes.cc
> +++ b/gcc/passes.cc
> @@ -1589,7 +1589,7 @@ pass_manager::pass_manager (context *ctxt)
>  #define POP_INSERT_PASSES()
>  #define NEXT_PASS(PASS, NUM) PASS ## _ ## NUM = NULL
>  #define NEXT_PASS_WITH_ARG(PASS, NUM, ARG) NEXT_PASS (PASS, NUM)
> -#define NEXT_PASS_WITH_ARG2(PASS, NUM, ARG0, ARG1) NEXT_PASS (PASS, NUM)
> +#define NEXT_PASS_WITH_ARGS(PASS, NUM, ...) NEXT_PASS (PASS, NUM)
>  #define TERMINATE_PASS_LIST(PASS)
>  #include "pass-instances.def"
>
> @@ -1636,11 +1636,16 @@ pass_manager::pass_manager (context *ctxt)
>PASS ## _ ## NUM->set_pass_param (0, ARG);   \
>  } while (0)
>
> -#define NEXT_PASS_WITH_ARG2(PASS, NUM, ARG0, ARG1) \
> +#define NEXT_PASS_WITH_ARGS(PASS, NUM, ...)\
>  do {   \
>NEXT_PASS (PASS, NUM);   \
> -  P

Re: [PATCH v2] alpha: Add -mlra option

2024-10-15 Thread John Paul Adrian Glaubitz
Hi Richard,

On Tue, 2024-10-15 at 14:47 +0200, Richard Biener wrote:
> If you can provide -mlra vs. -mno-lra testsuite results as well that
> would be interesting.

OK, I'll try to provide these.

> Does "just work" mean you can build the compiler and its target
> libraries?

I'm performing a full native bootstrap build with all languages except D,
Go and Rust enabled. This failed with an ICE with the default baseline
at some point, but continues to build for a while now with --with-cpu=ev56.

> In this case I would suggest to go further and pull the trigger now,
> defaulting to LRA but allowing to switch back to reload for testing.
> This is so the few people testing alpha at all can increase testing
> coverage - I don't think anybody runs older than EV5 HW.

Well, BWX requires EV56, not EV5.

> Is VMS on alpha still a thing btw?  I still see it mentioned in config.gcc

OpenVMS on Alpha seems to be still supported by HP [1]. Whether they're using
the latest version of GCC, is a different question through.

Adrian

> [1] https://h41379.www4.hpe.com/openvms/openvms_supportchart.html

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


Re: [PATCH] libstdc++: Implement LWG 3798 for range adaptors [PR106676]

2024-10-15 Thread Patrick Palka
On Mon, 14 Oct 2024, Jonathan Wakely wrote:

> Tested x86_64-linux.
> 
> -- >8 --
> 
> LWG 3798 modified the iterator_category of the iterator types for
> transform_view, join_with_view, zip_transform_view and
> adjacent_transform_view, to allow the iterator's reference type to be an
> rvalue reference.
> 
> libstdc++-v3/ChangeLog:
> 
>   PR libstdc++/106676
>   * include/bits/iterator_concepts.h (__cpp17_fwd_iterator): Use
>   is_reference instead of is_value_reference.
>   rvalue references.
>   * include/std/ranges (transform_view:__iter_cat::_S_iter_cat):
>   Likewise.
>   (zip_transform_view::__iter_cat::_S_iter_cat): Likewise.
>   (adjacent_transform_view::__iter_cat::_S_iter_cat): Likewise.
>   (join_with_view::__iter_cat::_S_iter_cat): Likewise.
>   * testsuite/std/ranges/adaptors/transform.cc: Check
>   iterator_category when the transformation function returns an
>   rvalue reference type.
> ---
>  libstdc++-v3/include/bits/iterator_concepts.h|  4 +++-
>  libstdc++-v3/include/std/ranges  | 16 
>  .../testsuite/std/ranges/adaptors/transform.cc   | 16 
>  3 files changed, 31 insertions(+), 5 deletions(-)
> 
> diff --git a/libstdc++-v3/include/bits/iterator_concepts.h 
> b/libstdc++-v3/include/bits/iterator_concepts.h
> index 490a362cdf1..669d3ddfd1e 100644
> --- a/libstdc++-v3/include/bits/iterator_concepts.h
> +++ b/libstdc++-v3/include/bits/iterator_concepts.h
> @@ -333,10 +333,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   typename incrementable_traits<_Iter>::difference_type>;
>   };
>  
> +// _GLIBCXX_RESOLVE_LIB_DEFECTS
> +// 3798. Rvalue reference and iterator_category
>  template
>concept __cpp17_fwd_iterator = __cpp17_input_iterator<_Iter>
>   && constructible_from<_Iter>
> - && is_lvalue_reference_v>
> + && is_reference_v>
>   && same_as>,
>  typename indirectly_readable_traits<_Iter>::value_type>
>   && requires(_Iter __it)
> diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
> index f0d81cbea0c..941189d65c3 100644
> --- a/libstdc++-v3/include/std/ranges
> +++ b/libstdc++-v3/include/std/ranges
> @@ -1892,7 +1892,9 @@ namespace views::__adaptor
>   using _Fpc = __detail::__maybe_const_t<_Const, _Fp>;
>   using _Base = transform_view::_Base<_Const>;
>   using _Res = invoke_result_t<_Fpc&, range_reference_t<_Base>>;
> - if constexpr (is_lvalue_reference_v<_Res>)
> + // _GLIBCXX_RESOLVE_LIB_DEFECTS
> + // 3798. Rvalue reference and iterator_category
> + if constexpr (is_reference_v<_Res>)
> {
>   using _Cat
> = typename 
> iterator_traits>::iterator_category;
> @@ -5047,7 +5049,9 @@ namespace views::__adaptor
> using __detail::__range_iter_cat;
> using _Res = invoke_result_t<__maybe_const_t<_Const, _Fp>&,
>  
> range_reference_t<__maybe_const_t<_Const, _Vs>>...>;
> -   if constexpr (!is_lvalue_reference_v<_Res>)
> +   // _GLIBCXX_RESOLVE_LIB_DEFECTS
> +   // 3798. Rvalue reference and iterator_category
> +   if constexpr (!is_reference_v<_Res>)
>   return input_iterator_tag{};
> else if constexpr ((derived_from<__range_iter_cat<_Vs, _Const>,
>  random_access_iterator_tag> && ...))
> @@ -5820,7 +5824,9 @@ namespace views::__adaptor
>using _Res = invoke_result_t<__unarize<__maybe_const_t<_Const, _Fp>&, 
> _Nm>,
>  range_reference_t<_Base>>;
>using _Cat = typename 
> iterator_traits>::iterator_category;
> -  if constexpr (!is_lvalue_reference_v<_Res>)
> +  // _GLIBCXX_RESOLVE_LIB_DEFECTS
> +  // 3798. Rvalue reference and iterator_category
> +  if constexpr (!is_reference_v<_Res>)
>   return input_iterator_tag{};
>else if constexpr (derived_from<_Cat, random_access_iterator_tag>)
>   return random_access_iterator_tag{};
> @@ -7228,7 +7234,9 @@ namespace views::__adaptor
> using _OuterCat = typename 
> iterator_traits<_OuterIter>::iterator_category;
> using _InnerCat = typename 
> iterator_traits<_InnerIter>::iterator_category;
> using _PatternCat = typename 
> iterator_traits<_PatternIter>::iterator_category;
> -   if constexpr 
> (!is_lvalue_reference_v,
> +   // _GLIBCXX_RESOLVE_LIB_DEFECTS
> +   // 3798. Rvalue reference and iterator_category
> +   if constexpr 
> (!is_reference_v,
> 
> iter_reference_t<_PatternIter>>>)

This line is misaligned with the previous one now.  LGTM besides that

>   return input_iterator_tag{};
> else if constexpr (derived_from<_OuterCat, bidirectional_iterator_tag>
> diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/transform.cc 
> b/libstdc++-v3/testsuite

[PATCH v2] Introduce TARGET_CLONES_ATTR_SEPARATOR for RISC-V

2024-10-15 Thread Yangyu Chen
Some architectures may use ',' in the attribute string, but it is not
used as the separator for different targets. To avoid conflict, we
introduce a new macro TARGET_CLONES_ATTR_SEPARATOR to separate different
clones.

As an example, according to RISC-V C-API Specification [1], RISC-V allows
',' in the attribute string in the "arch=" option to specify one more
ISA extensions in the same target function, which conflict with the
default separator to separate different clones. This patch introduces
TARGET_CLONES_ATTR_SEPARATOR for RISC-V and choose '#' as the separator,
since '#' is not allowed in the target_clones option string.

[1] 
https://github.com/riscv-non-isa/riscv-c-api-doc/blob/c6c5d6d9cf96b342293315a5dff3d25e96ef8191/src/c-api.adoc#__attribute__targetattr-string

gcc/ChangeLog:

* defaults.h (TARGET_CLONES_ATTR_SEPARATOR): Define new macro.
* multiple_target.cc (get_attr_str): Use
  TARGET_CLONES_ATTR_SEPARATOR to separate attributes.
(separate_attrs): Likewise.
* config/riscv/riscv.h (TARGET_CLONES_ATTR_SEPARATOR): Define
  TARGET_CLONES_ATTR_SEPARATOR for RISC-V.
---
 gcc/config/riscv/riscv.h |  5 +
 gcc/defaults.h   |  4 
 gcc/multiple_target.cc   | 19 ---
 3 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index ca1b8329cdc..2ff9c1024f3 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -1298,4 +1298,9 @@ extern void riscv_remove_unneeded_save_restore_calls 
(void);
STACK_BOUNDARY / BITS_PER_UNIT)\
 : (crtl->outgoing_args_size + STACK_POINTER_OFFSET))
 
+/* According to the RISC-V C API, the arch string may contains ','. To avoid
+   the conflict with the default separator, we choose '#' as the separator for
+   the target attribute.  */
+#define TARGET_CLONES_ATTR_SEPARATOR '#'
+
 #endif /* ! GCC_RISCV_H */
diff --git a/gcc/defaults.h b/gcc/defaults.h
index ac2d25852ab..918e3ec2f24 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -874,6 +874,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #define TARGET_HAS_FMV_TARGET_ATTRIBUTE 1
 #endif
 
+/* Select a attribute separator for function multiversioning.  */
+#ifndef TARGET_CLONES_ATTR_SEPARATOR
+#define TARGET_CLONES_ATTR_SEPARATOR ','
+#endif
 
 /* Select a format to encode pointers in exception handling data.  We
prefer those that result in fewer dynamic relocations.  Assume no
diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc
index 1fdd279da04..c1e358dfc1e 100644
--- a/gcc/multiple_target.cc
+++ b/gcc/multiple_target.cc
@@ -180,7 +180,7 @@ create_dispatcher_calls (struct cgraph_node *node)
 }
 }
 
-/* Create string with attributes separated by comma.
+/* Create string with attributes separated by TARGET_CLONES_ATTR_SEPARATOR.
Return number of attributes.  */
 
 static int
@@ -194,17 +194,21 @@ get_attr_str (tree arglist, char *attr_str)
 {
   const char *str = TREE_STRING_POINTER (TREE_VALUE (arg));
   size_t len = strlen (str);
-  for (const char *p = strchr (str, ','); p; p = strchr (p + 1, ','))
+  for (const char *p = strchr (str, TARGET_CLONES_ATTR_SEPARATOR);
+  p;
+  p = strchr (p + 1, TARGET_CLONES_ATTR_SEPARATOR))
argnum++;
   memcpy (attr_str + str_len_sum, str, len);
-  attr_str[str_len_sum + len] = TREE_CHAIN (arg) ? ',' : '\0';
+  attr_str[str_len_sum + len]
+   = TREE_CHAIN (arg) ? TARGET_CLONES_ATTR_SEPARATOR : '\0';
   str_len_sum += len + 1;
   argnum++;
 }
   return argnum;
 }
 
-/* Return number of attributes separated by comma and put them into ARGS.
+/* Return number of attributes separated by TARGET_CLONES_ATTR_SEPARATOR
+   and put them into ARGS.
If there is no DEFAULT attribute return -1.
If there is an empty string in attribute return -2.
If there are multiple DEFAULT attributes return -3.
@@ -215,9 +219,10 @@ separate_attrs (char *attr_str, char **attrs, int attrnum)
 {
   int i = 0;
   int default_count = 0;
+  static const char separator_str[] = { TARGET_CLONES_ATTR_SEPARATOR, 0 };
 
-  for (char *attr = strtok (attr_str, ",");
-   attr != NULL; attr = strtok (NULL, ","))
+  for (char *attr = strtok (attr_str, separator_str);
+   attr != NULL; attr = strtok (NULL, separator_str))
 {
   if (strcmp (attr, "default") == 0)
{
@@ -305,7 +310,7 @@ static bool
 expand_target_clones (struct cgraph_node *node, bool definition)
 {
   int i;
-  /* Parsing target attributes separated by comma.  */
+  /* Parsing target attributes separated by TARGET_CLONES_ATTR_SEPARATOR.  */
   tree attr_target = lookup_attribute ("target_clones",
   DECL_ATTRIBUTES (node->decl));
   /* No targets specified.  */
-- 
2.45.2



Re: [PATCH] c, v3: Implement C2Y N3355 - Named Loops [PR117022]

2024-10-15 Thread Joseph Myers
On Tue, 15 Oct 2024, Jakub Jelinek wrote:

> Here is a new version of the patch, tested on the dg.exp=*named-loops*
> tests fine, I think it doesn't need more testing given that it is just
> comment changes in code plus testsuite changes.

This version is OK.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH] Introduce TARGET_FMV_ATTR_SEPARATOR

2024-10-15 Thread Yangyu Chen



> On Oct 15, 2024, at 20:11, Andrew Carlotti  wrote:
> 
> On Tue, Oct 15, 2024 at 02:18:43PM +0800, Yangyu Chen wrote:
>> Some architectures may use ',' in the attribute string, but it is not
>> used as the separator for different targets. To avoid conflict, we
>> introduce a new macro TARGET_FMV_ATTR_SEPARATOR to separate different
>> clones.
> 
> This is only for the target_clones attribute, so how about calling it
> TARGET_CLONES_ATTR_SEPARATOR instead (or pluralised - see below)?
> 

Sound like a good idea. I choose to use TARGET_CLONES_ATTR_SEPARATOR in the next
revision.

Link: 
https://patchwork.sourceware.org/project/gcc/patch/20241015181607.3689413-1-chenyan...@isrc.iscas.ac.cn/

>> As an example, according to RISC-V C-API Specification [1], RISC-V allows
>> ',' in the attribute string in the "arch=" option to specify one more
>> ISA extensions in the same target function, which conflict with the
>> default separator to separate different clones. This patch introduces
>> TARGET_FMV_ATTR_SEPARATOR for RISC-V and choose '#' as the separator,
>> since '#' is not allowed in the target_clones option string.
>> 
>> [1] 
>> https://github.com/riscv-non-isa/riscv-c-api-doc/blob/c6c5d6d9cf96b342293315a5dff3d25e96ef8191/src/c-api.adoc#__attribute__targetattr-string
>> 
>> gcc/ChangeLog:
>> 
>>* defaults.h (TARGET_FMV_ATTR_SEPARATOR): Define new macro.
>>* multiple_target.cc (get_attr_str): Use
>>  TARGET_FMV_ATTR_SEPARATOR to separate attributes.
>>(separate_attrs): Likewise.
>>* config/riscv/riscv.h (TARGET_FMV_ATTR_SEPARATOR): Define
>>  TARGET_FMV_ATTR_SEPARATOR for RISC-V.
>> ---
>> gcc/config/riscv/riscv.h |  5 +
>> gcc/defaults.h   |  4 
>> gcc/multiple_target.cc   | 19 ---
>> 3 files changed, 21 insertions(+), 7 deletions(-)
>> 
>> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
>> index ca1b8329cdc..858cab72a4c 100644
>> --- a/gcc/config/riscv/riscv.h
>> +++ b/gcc/config/riscv/riscv.h
>> @@ -1298,4 +1298,9 @@ extern void riscv_remove_unneeded_save_restore_calls 
>> (void);
>> STACK_BOUNDARY / BITS_PER_UNIT)\
>> : (crtl->outgoing_args_size + STACK_POINTER_OFFSET))
>> 
>> +/* According to the RISC-V C API, the arch string may contains ','. To avoid
>> +   the conflict with the default separator, we choose '#' as the separator 
>> for
>> +   the target attribute.  */
>> +#define TARGET_FMV_ATTR_SEPARATOR '#'
>> +
>> #endif /* ! GCC_RISCV_H */
>> diff --git a/gcc/defaults.h b/gcc/defaults.h
>> index ac2d25852ab..f451efcb33e 100644
>> --- a/gcc/defaults.h
>> +++ b/gcc/defaults.h
>> @@ -874,6 +874,10 @@ see the files COPYING3 and COPYING.RUNTIME 
>> respectively.  If not, see
>> #define TARGET_HAS_FMV_TARGET_ATTRIBUTE 1
>> #endif
>> 
>> +/* Select a attribute separator for function multiversioning.  */
>> +#ifndef TARGET_FMV_ATTR_SEPARATOR
>> +#define TARGET_FMV_ATTR_SEPARATOR ','
>> +#endif
>> 
>> /* Select a format to encode pointers in exception handling data.  We
>>prefer those that result in fewer dynamic relocations.  Assume no
>> diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc
>> index 1fdd279da04..5a056b44571 100644
>> --- a/gcc/multiple_target.cc
>> +++ b/gcc/multiple_target.cc
>> @@ -180,7 +180,7 @@ create_dispatcher_calls (struct cgraph_node *node)
>> }
>> }
>> 
>> -/* Create string with attributes separated by comma.
>> +/* Create string with attributes separated by TARGET_FMV_ATTR_SEPARATOR.
>>Return number of attributes.  */
>> 
>> static int
>> @@ -194,17 +194,21 @@ get_attr_str (tree arglist, char *attr_str)
>> {
>>   const char *str = TREE_STRING_POINTER (TREE_VALUE (arg));
>>   size_t len = strlen (str);
>> -  for (const char *p = strchr (str, ','); p; p = strchr (p + 1, ','))
>> +  for (const char *p = strchr (str, TARGET_FMV_ATTR_SEPARATOR);
>> +p;
>> +p = strchr (p + 1, TARGET_FMV_ATTR_SEPARATOR))
>> argnum++;
>>   memcpy (attr_str + str_len_sum, str, len);
>> -  attr_str[str_len_sum + len] = TREE_CHAIN (arg) ? ',' : '\0';
>> +  attr_str[str_len_sum + len]
>> + = TREE_CHAIN (arg) ? TARGET_FMV_ATTR_SEPARATOR : '\0';
>>   str_len_sum += len + 1;
>>   argnum++;
>> }
>>   return argnum;
>> }
>> 
>> -/* Return number of attributes separated by comma and put them into ARGS.
>> +/* Return number of attributes separated by TARGET_FMV_ATTR_SEPARATOR and 
>> put
>> +   them into ARGS.
>>If there is no DEFAULT attribute return -1.
>>If there is an empty string in attribute return -2.
>>If there are multiple DEFAULT attributes return -3.
>> @@ -215,9 +219,10 @@ separate_attrs (char *attr_str, char **attrs, int 
>> attrnum)
>> {
>>   int i = 0;
>>   int default_count = 0;
>> +  char separator_str[] = {TARGET_FMV_ATTR_SEPARATOR, '\0'};
> 
> How about defining the macro as a string (and appending an S to the name - 
> e.g.
> TARGET_CLONES_ATTR_SEPARATORS)?
> 

I didn't find a clean way in C to build a char

Re: [PATCH v2] bpf: make sure CO-RE relocs are typed with struct BTF_KIND_STRUCT

2024-10-15 Thread David Faust


On 10/14/24 11:04, Cupertino Miranda wrote:
> Hi everyone,
> 
> Here is the v2 for the patch in this thread:
>   https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665378.html
> Please noticed that commit message was adapted to new content.
> 
> Regards,
> Cupertino

Hi Cupertino,

Thanks for the patch. The fix LGTM.

I have some comments about the new test below. But it is ok for now, and
I think addressing them properly is out of scope of this patch. This fix
will be good to have. So the patch is OK.

> 
> Based on observation within bpf-next selftests and comparisson of GCC
> and clang compiled code, the BPF loader expects all CO-RE relocations to
> point to BTF non const and non volatile type nodes.
> ---
>  gcc/btfout.cc |  2 +-
>  gcc/config/bpf/btfext-out.cc  |  7 
>  gcc/ctfc.h|  2 +
>  .../gcc.target/bpf/core-attr-const.c  | 40 +++
>  4 files changed, 50 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/bpf/core-attr-const.c
> 
> diff --git a/gcc/btfout.cc b/gcc/btfout.cc
> index 8b91bde8798..24f62ec1a52 100644
> --- a/gcc/btfout.cc
> +++ b/gcc/btfout.cc
> @@ -167,7 +167,7 @@ get_btf_kind (uint32_t ctf_kind)
>  
>  /* Convenience wrapper around get_btf_kind for the common case.  */
>  
> -static uint32_t
> +uint32_t
>  btf_dtd_kind (ctf_dtdef_ref dtd)
>  {
>if (!dtd)
> diff --git a/gcc/config/bpf/btfext-out.cc b/gcc/config/bpf/btfext-out.cc
> index 095c35b894b..30266a2d384 100644
> --- a/gcc/config/bpf/btfext-out.cc
> +++ b/gcc/config/bpf/btfext-out.cc
> @@ -320,6 +320,13 @@ bpf_core_reloc_add (const tree type, const char * 
> section_name,
>ctf_container_ref ctfc = ctf_get_tu_ctfc ();
>ctf_dtdef_ref dtd = ctf_lookup_tree_type (ctfc, type);
>  
> +  /* Make sure CO-RE type is never the const or volatile version.  */
> +  if ((btf_dtd_kind (dtd) == BTF_KIND_CONST
> +   || btf_dtd_kind (dtd) == BTF_KIND_VOLATILE)
> +  && kind >= BPF_RELO_FIELD_BYTE_OFFSET
> +  && kind <= BPF_RELO_FIELD_RSHIFT_U64)
> +dtd = dtd->ref_type;
> +
>/* Buffer the access string in the auxiliary strtab.  */
>bpfcr->bpfcr_astr_off = 0;
>gcc_assert (accessor != NULL);
> diff --git a/gcc/ctfc.h b/gcc/ctfc.h
> index 41e1169f271..e5967f590f9 100644
> --- a/gcc/ctfc.h
> +++ b/gcc/ctfc.h
> @@ -465,4 +465,6 @@ extern void btf_mark_type_used (tree);
>  extern int ctfc_get_dtd_srcloc (ctf_dtdef_ref, ctf_srcloc_ref);
>  extern int ctfc_get_dvd_srcloc (ctf_dvdef_ref, ctf_srcloc_ref);
>  
> +extern uint32_t btf_dtd_kind (ctf_dtdef_ref dtd);
> +
>  #endif /* GCC_CTFC_H */
> diff --git a/gcc/testsuite/gcc.target/bpf/core-attr-const.c 
> b/gcc/testsuite/gcc.target/bpf/core-attr-const.c
> new file mode 100644
> index 000..da6113a3faf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/bpf/core-attr-const.c
> @@ -0,0 +1,40 @@
> +/* Test to make sure CO-RE access relocs point to non const versions of the
> +   type.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O0 -dA -gbtf -mco-re -masm=normal" } */
> +
> +struct S {
> +  int a;
> +  int b;
> +  int c;
> +} __attribute__((preserve_access_index));
> +
> +void
> +func (struct S * s)
> +{
> +  int *x;
> +  int *y;
> +  int *z;
> +  struct S tmp;
> +  const struct S *cs = s;
> +  volatile struct S *vs = s;
> +
> +  /* 0:2 */
> +  x = &(s->c);
> +
> +  /* 0:1 */
> +  y = (int *) &(cs->b);
> +
> +  /* 0:1 */
> +  z = (int *) &(vs->b);
> +
> +  *x = 4;
> +  *y = 4;
> +  *z = 4;
> +}
> +
> +/* Both const and non const struct type should have the same bpfcr_type. */
> +/* { dg-final { scan-assembler-times "0x1\t# bpfcr_type \\(struct S\\)" 1 } 
> } */
> +/* { dg-final { scan-assembler-times "0x1\t# bpfcr_type \\(const struct 
> S\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "0x1\t# bpfcr_type \\(volatile struct 
> S\\)" 1 } } */

The issue with these checks is that you are checking the exact BTF ID,
which is, in theory, not guaranteed to remain consistent between
different runs of the compiler. Even though in practice it does for now,
this could be broken by completely unrelated changes in the
DWARF-CTF-BTF pipeline.

But, I also understand that the purpose is to make sure these three
relocation records point to exactly the same BTF type. And currently we
do not have a good way to check that.

In addition, it is misleading that the asm comments refer to the
qualified types, which no longer match the actual BTF ID in the record.
I see that this is coming from btfext_out.cc:output_btfext_core_sections
which does the pretty-print from the original type expression rather
than the BTF ID that is used.

I wonder if it would instead make sense to use the btf_asm_type_ref
function from btfout.cc to asm the type ref (and the comment), which
will do so based on the type ID rather than the original type
expression.  Then for this test we can take a similar approach to the
BTF tests: drop the check on the conc

Re: [PATCH] Match: Remove dup match pattern for signed_integer_sat_sub [PR117141]

2024-10-15 Thread Richard Biener
On Tue, Oct 15, 2024 at 1:31 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to fix the warning as below:
>
> /home/slyfox/dev/git/gcc/gcc/match.pd:3424:3 warning: duplicate pattern
>  (cond^ (ne (imagpart (IFN_SUB_OVERFLOW:c@2 @0 @1)) integer_zerop)
>   ^
> /home/slyfox/dev/git/gcc/gcc/match.pd:3397:3 warning: previous pattern
> defined here
>  (cond^ (ne (imagpart (IFN_SUB_OVERFLOW@2 @0 @1)) integer_zerop)
>
> The second has a optional nop_convert which allows for the first one,
> thus remove the dup one.

OK.

> PR middle-end/117141
>
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 fully regression test.
>
> gcc/ChangeLog:
>
> * match.pd: Remove the dup pattern for signed SAT_SUB.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 15 ++-
>  1 file changed, 2 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index ee53c25cef9..22fad1a8757 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3395,7 +3395,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> The T and UT are type pair like T=int8_t, UT=uint8_t.  */
>  (match (signed_integer_sat_sub @0 @1)
>   (cond^ (ne (imagpart (IFN_SUB_OVERFLOW@2 @0 @1)) integer_zerop)
> -   (bit_xor:c (negate (convert (lt @0 integer_zerop)))
> +   (bit_xor:c (nop_convert?
> +   (negate (nop_convert? (convert (lt @0 integer_zerop)
>max_value)
> (realpart @2))
>   (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)
> @@ -3417,18 +3418,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> @2)
>   (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type
>
> -/* Signed saturation sub, case 5:
> -   Z = .SUB_OVERFLOW (X, Y)
> -   SAT_S_SUB = IMAGPART_EXPR (Z) != 0 ? (-(T)(X < 0) ^ MAX) : minus;  */
> -(match (signed_integer_sat_sub @0 @1)
> - (cond^ (ne (imagpart (IFN_SUB_OVERFLOW:c@2 @0 @1)) integer_zerop)
> -   (bit_xor:c (nop_convert?
> -   (negate (nop_convert? (convert (lt @0 integer_zerop)
> -  max_value)
> -   (realpart @2))
> - (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)
> -  && types_match (type, @0, @1
> -
>  /* Unsigned saturation truncate, case 1, sizeof (WT) > sizeof (NT).
> SAT_U_TRUNC = (NT)x | (NT)(-(X > (WT)(NT)(-1))).  */
>  (match (unsigned_integer_sat_trunc @0)
> --
> 2.43.0
>


Re: [RFC][PATCH] AArch64: Remove AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS

2024-10-15 Thread Jennifer Schmitz


> On 1 Oct 2024, at 21:30, Tamar Christina  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi Jennifer,
> 
>> -Original Message-
>> From: Jennifer Schmitz 
>> Sent: Tuesday, September 24, 2024 9:23 AM
>> To: gcc-patches@gcc.gnu.org
>> Cc: Tamar Christina ; Richard Sandiford
>> ; Kyrylo Tkachov 
>> Subject: Re: [RFC][PATCH] AArch64: Remove
>> AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
>> 
>> 
>> 
>>> On 28 Aug 2024, at 14:56, Kyrylo Tkachov  wrote:
>>> 
>>> 
>>> 
 On 28 Aug 2024, at 10:27, Tamar Christina  wrote:
 
 External email: Use caution opening links or attachments
 
 
> -Original Message-
> From: Kyrylo Tkachov 
> Sent: Wednesday, August 28, 2024 8:55 AM
> To: Tamar Christina 
> Cc: Richard Sandiford ; Jennifer Schmitz
> ; gcc-patches@gcc.gnu.org; Kyrylo Tkachov
> 
> Subject: Re: [RFC][PATCH] AArch64: Remove
> AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> 
> Hi all,
> 
> Thanks to Jennifer for proposing a patch and Tamar and Richard for digging
>> into it.
> 
>> On 27 Aug 2024, at 13:16, Tamar Christina 
>> wrote:
>> 
>> External email: Use caution opening links or attachments
>> 
>> 
>>> -Original Message-
>>> From: Richard Sandiford 
>>> Sent: Tuesday, August 27, 2024 11:46 AM
>>> To: Tamar Christina 
>>> Cc: Jennifer Schmitz ; gcc-patches@gcc.gnu.org;
>> Kyrylo
>>> Tkachov 
>>> Subject: Re: [RFC][PATCH] AArch64: Remove
>>> AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
>>> 
>>> Tamar Christina  writes:
 Hi Jennifer,
 
> -Original Message-
> From: Jennifer Schmitz 
> Sent: Friday, August 23, 2024 1:07 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford ; Kyrylo Tkachov
> 
> Subject: [RFC][PATCH] AArch64: Remove
> AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> 
> This patch removes the
>> AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> tunable and
> use_new_vector_costs entry in aarch64-tuning-flags.def and makes the
> AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS paths in the backend
> the
> default.
>>> 
>>> Thanks for doing this.  This has been on my TODO list ever since the
>>> tunable was added.
>>> 
>>> The history is that these "new" costs were originally added in stage 4
>>> of GCC 11 for Neoverse V1.  Since the costs were added so late, it 
>>> wasn't
>>> appropriate to change the behaviour for any other core.  All the new 
>>> code
>>> was therefore gated on this option.
>>> 
>>> The new costs do two main things:
>>> 
>>> (1) use throughput-based calculations where available, including to 
>>> choose
>>> between Advanced SIMD and SVE
>>> 
>>> (2) try to make the latency-based costs more precise, by looking more
>> closely
>>> at the provided stmt_info
>>> 
>>> Old cost models won't be affected by (1) either way, since they don't
>>> provide any throughput information.  But they should in principle 
>>> benefit
>>> from (2).  So...
>>> 
> To that end, the function aarch64_use_new_vector_costs_p and its uses
> were
> removed. Additionally, guards were added prevent nullpointer
>> dereferences
> of
> fields in cpu_vector_cost.
> 
 
 I'm not against this change, but it does mean that we now switch old 
 Adv.
> SIMD
 cost models as well to the new throughput based cost models.  That 
 means
> that
 -mcpu=generic now behaves differently, and -mcpu=neoverse-n1 and I
>> think
 some distros explicitly use this (I believe yocto for instance does).
>>> 
>>> ...it shouldn't mean that we start using throughput-based models for
>>> cortexa53 etc., since there's no associated issue info.
>> 
>> Yes, I was using throughput based model as a name.  But as you indicated 
>> in
>> (2)
>> it does change the latency calculation.
>> 
>> My question was because of things in e.g. aarch64_adjust_stmt_cost and
> friends,
>> e.g. aarch64_multiply_add_p changes the cost between FMA SIMD vs scalar.
>> 
>> So my question..
>> 
>>> 
 Have we validated that the old generic cost model still behaves 
 sensibly
>> with
> this
>>> change?
>> 
>> is still valid I think, we *are* changing the cost for all models,
>> and while they should indeed be more accurate, there could be knock on
>> effects.
>> 
> 
> We can run SPEC on a Grace system with -mcpu=generic to see what the 
> effect
>> is,
> but wider benchmarking would be more appropriate. Can you help with that
> Tamar once we agree on the other implementation details in this patch?
> 
 
 Sure that's not a problem.  Just ping me when you have a

Re: [PATCH]AArch64 re-enable memory access costing after SLP change.

2024-10-15 Thread Richard Sandiford
Tamar Christina  writes:
> Hi All,
>
> While chasing down a costing difference between SLP and non-SLP for memory
> access costing I noticed that at some point the SLP and non-SLP costing have
> diverged.  It used to be we only supported LOAD_LANES in SLP and so the 
> non-SLP
> costing was working fine.
>
> But with the change to SLP only we now lost costing.
>
> It looks like the vectorizer for non-SLP stores the VMAT type in
> STMT_VINFO_MEMORY_ACCESS_TYPE on the stmt_info, but for SLP it stores it in
> SLP_TREE_MEMORY_ACCESS_TYPE which is on the SLP node itself.
>
> While my first attempt of a patch was to just also store the VMAT in the
> stmt_info https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665295.html
> Richi pointed out that this goes wrong when the same access is used Hybrid.
>
> And so we have to do a backend specific fix.  To help out other backends this
> also introduces a generic helper function suggested by Richi in that patch
> (I hope that's ok.. I didn't want to split out just the helper.)
>
> This successfully restores VMAT based costing in the new SLP only world.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * tree-vectorizer.h (vect_mem_access_type): New.
>   * config/aarch64/aarch64.cc (aarch64_ld234_st234_vectors): Use it.
>   (aarch64_detect_vector_stmt_subtype): Likewise.
>   (aarch64_adjust_stmt_cost): Likewise.
>   (aarch64_vector_costs::count_ops): Likewise.
>   (aarch64_vector_costs::add_stmt_cost): Make SLP node named.

Looks OK, but some formatting trivia:

>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 102680a0efca1ce928e6945033c01cfb68a65152..055b0ff47c68dc5e7560debe5a29dcdc9df21f8c
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -16278,7 +16278,7 @@ public:
>  private:
>void record_potential_advsimd_unrolling (loop_vec_info);
>void analyze_loop_vinfo (loop_vec_info);
> -  void count_ops (unsigned int, vect_cost_for_stmt, stmt_vec_info,
> +  void count_ops (unsigned int, vect_cost_for_stmt, stmt_vec_info, slp_tree,
> aarch64_vec_op_count *);
>fractional_cost adjust_body_cost_sve (const aarch64_vec_op_count *,
>   fractional_cost, unsigned int,
> @@ -16599,7 +16599,8 @@ aarch64_builtin_vectorization_cost (enum 
> vect_cost_for_stmt type_of_cost,
> vector of an LD[234] or ST[234] operation.  Return the total number of
> vectors (2, 3 or 4) if so, otherwise return a value outside that range.  
> */
>  static int
> -aarch64_ld234_st234_vectors (vect_cost_for_stmt kind, stmt_vec_info 
> stmt_info)
> +aarch64_ld234_st234_vectors (vect_cost_for_stmt kind, stmt_vec_info 
> stmt_info,
> +  slp_tree node)

Normally the comment should be updated to mention NODE.  But it probably
isn't worth it, since presumably after the SLP transition, everything
could be keyed off the node only (meaning a larger rewrite).

>  {
>if ((kind == vector_load
> || kind == unaligned_load
> @@ -16609,7 +16610,7 @@ aarch64_ld234_st234_vectors (vect_cost_for_stmt kind, 
> stmt_vec_info stmt_info)
>  {
>stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
>if (stmt_info
> -   && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_LOAD_STORE_LANES)
> +   && vect_mem_access_type (stmt_info, node) == VMAT_LOAD_STORE_LANES)
>   return DR_GROUP_SIZE (stmt_info);
>  }
>return 0;
> @@ -16847,14 +16848,15 @@ aarch64_detect_scalar_stmt_subtype (vec_info 
> *vinfo, vect_cost_for_stmt kind,
>  }
>  
>  /* STMT_COST is the cost calculated by aarch64_builtin_vectorization_cost
> -   for the vectorized form of STMT_INFO, which has cost kind KIND and which
> -   when vectorized would operate on vector type VECTYPE.  Try to subdivide
> -   the target-independent categorization provided by KIND to get a more
> +   for the vectorized form of STMT_INFO possibly using SLP node NODE, which 
> has cost
> +   kind KIND and which when vectorized would operate on vector type VECTYPE. 
>  Try to
> +   subdivide the target-independent categorization provided by KIND to get a 
> more
> accurate cost.  WHERE specifies where the cost associated with KIND
> occurs.  */

Needs to be reflowed to 80 characters (all the "+" lines are longer).

>  static fractional_cost
>  aarch64_detect_vector_stmt_subtype (vec_info *vinfo, vect_cost_for_stmt kind,
> - stmt_vec_info stmt_info, tree vectype,
> + stmt_vec_info stmt_info, slp_tree node,
> + tree vectype,
>   enum vect_cost_model_location where,
>   fractional_cost stmt_cost)
>  {
> [...]
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 
> 2775d873ca42436fb6b6789ca8

Re: [PATCH 1/4]middle-end: support multi-step zero-extends using VEC_PERM_EXPR

2024-10-15 Thread Richard Sandiford
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Monday, October 14, 2024 7:34 PM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; nd ; rguent...@suse.de
>> Subject: Re: [PATCH 1/4]middle-end: support multi-step zero-extends using
>> VEC_PERM_EXPR
>> 
>> Tamar Christina  writes:
>> > Hi All,
>> >
>> > This patch series adds support for a target to do a direct convertion for 
>> > zero
>> > extends using permutes.
>> >
>> > To do this it uses a target hook use_permute_for_promotio which must be
>> > implemented by targets.  This hook is used to indicate:
>> >
>> >  1. can a target do this for the given modes.
>> >  2. is it profitable for the target to do it.
>> >  3. can the target convert between various vector modes with a 
>> > VIEW_CONVERT.
>> >
>> > Using permutations have a big benefit of multi-step zero extensions because
>> they
>> > both reduce the number of needed instructions, but also increase 
>> > throughput as
>> > the dependency chain is removed.
>> >
>> > Concretely on AArch64 this changes:
>> >
>> > void test4(unsigned char *x, long long *y, int n) {
>> > for(int i = 0; i < n; i++) {
>> > y[i] = x[i];
>> > }
>> > }
>> >
>> > from generating:
>> >
>> > .L4:
>> > ldr q30, [x4], 16
>> > add x3, x3, 128
>> > zip1v1.16b, v30.16b, v31.16b
>> > zip2v30.16b, v30.16b, v31.16b
>> > zip1v2.8h, v1.8h, v31.8h
>> > zip1v0.8h, v30.8h, v31.8h
>> > zip2v1.8h, v1.8h, v31.8h
>> > zip2v30.8h, v30.8h, v31.8h
>> > zip1v26.4s, v2.4s, v31.4s
>> > zip1v29.4s, v0.4s, v31.4s
>> > zip1v28.4s, v1.4s, v31.4s
>> > zip1v27.4s, v30.4s, v31.4s
>> > zip2v2.4s, v2.4s, v31.4s
>> > zip2v0.4s, v0.4s, v31.4s
>> > zip2v1.4s, v1.4s, v31.4s
>> > zip2v30.4s, v30.4s, v31.4s
>> > stp q26, q2, [x3, -128]
>> > stp q28, q1, [x3, -96]
>> > stp q29, q0, [x3, -64]
>> > stp q27, q30, [x3, -32]
>> > cmp x4, x5
>> > bne .L4
>> >
>> > and instead we get:
>> >
>> > .L4:
>> > add x3, x3, 128
>> > ldr q23, [x4], 16
>> > tbl v5.16b, {v23.16b}, v31.16b
>> > tbl v4.16b, {v23.16b}, v30.16b
>> > tbl v3.16b, {v23.16b}, v29.16b
>> > tbl v2.16b, {v23.16b}, v28.16b
>> > tbl v1.16b, {v23.16b}, v27.16b
>> > tbl v0.16b, {v23.16b}, v26.16b
>> > tbl v22.16b, {v23.16b}, v25.16b
>> > tbl v23.16b, {v23.16b}, v24.16b
>> > stp q5, q4, [x3, -128]
>> > stp q3, q2, [x3, -96]
>> > stp q1, q0, [x3, -64]
>> > stp q22, q23, [x3, -32]
>> > cmp x4, x5
>> > bne .L4
>> >
>> > Tests are added in the AArch64 patch introducing the hook.  The testsuite 
>> > also
>> > already had about 800 runtime tests that get affected by this.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf,
>> > x86_64-pc-linux-gnu -m32, -m64 and no issues.
>> >
>> > Ok for master?
>> >
>> > Thanks,
>> > Tamar
>> >
>> > gcc/ChangeLog:
>> >
>> >* target.def (use_permute_for_promotion): New.
>> >* doc/tm.texi.in: Document it.
>> >* doc/tm.texi: Regenerate.
>> >* targhooks.cc (default_use_permute_for_promotion): New.
>> >* targhooks.h (default_use_permute_for_promotion): New.
>> >(vectorizable_conversion): Support direct convertion with permute.
>> >* tree-vect-stmts.cc (vect_create_vectorized_promotion_stmts): Likewise.
>> >(supportable_widening_operation): Likewise.
>> >(vect_gen_perm_mask_any): Allow vector permutes where input registers
>> >are half the width of the result per the GCC 14 relaxation of
>> >VEC_PERM_EXPR.
>> >
>> > ---
>> >
>> > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
>> > index
>> 4deb3d2c283a2964972b94f434370a6f57ea816a..e8192590ac14005bf7cb5f73
>> 1c16ee7eacb78143 100644
>> > --- a/gcc/doc/tm.texi
>> > +++ b/gcc/doc/tm.texi
>> > @@ -6480,6 +6480,15 @@ type @code{internal_fn}) should be considered
>> expensive when the mask is
>> >  all zeros.  GCC can then try to branch around the instruction instead.
>> >  @end deftypefn
>> >
>> > +@deftypefn {Target Hook} bool
>> TARGET_VECTORIZE_USE_PERMUTE_FOR_PROMOTION (const_tree
>> @var{in_type}, const_tree @var{out_type})
>> > +This hook returns true if the operation promoting @var{in_type} to
>> > +@var{out_type} should be done as a vector permute.  If @var{out_type} is
>> > +a signed type the operation will be done as the related unsigned type and
>> > +converted to @var{out_type}.  If the target supports the needed permute,
>> > +is able to convert unsigned(@var{out_type}) to @var{out_type} and it is
>> > +beneficial to the hook should return true, else false should be returned.
>> > +@end deftypefn
>> 
>> Just a review of the documentation, but: i

[PATCH v2] alpha: Add -mlra option

2024-10-15 Thread John Paul Adrian Glaubitz
PR target/66207
* config/alpha/alpha.opt (mlra): New target option.
* config/alpha/alpha.cc (alpha_use_lra_p): New function.
(TARGET_LRA_P): Use it.
* config/alpha/alpha.opt.urls: Regenerate.

Signed-off-by: John Paul Adrian Glaubitz 
---
 gcc/config/alpha/alpha.cc   | 10 +-
 gcc/config/alpha/alpha.opt  |  5 +
 gcc/config/alpha/alpha.opt.urls |  2 ++
 3 files changed, 16 insertions(+), 1 deletion(-)

v2:
- Rephrase patch short summary

diff --git a/gcc/config/alpha/alpha.cc b/gcc/config/alpha/alpha.cc
index 74631a41693..218c66b6090 100644
--- a/gcc/config/alpha/alpha.cc
+++ b/gcc/config/alpha/alpha.cc
@@ -9941,6 +9941,14 @@ alpha_c_mode_for_floating_type (enum tree_index ti)
   return default_mode_for_floating_type (ti);
 }
 
+/* Implement TARGET_LRA_P.  */
+
+static bool
+alpha_use_lra_p ()
+{
+  return alpha_lra_p;
+}
+
 /* Initialize the GCC target structure.  */
 #if TARGET_ABI_OPEN_VMS
 # undef TARGET_ATTRIBUTE_TABLE
@@ -10124,7 +10132,7 @@ alpha_c_mode_for_floating_type (enum tree_index ti)
 #endif
 
 #undef TARGET_LRA_P
-#define TARGET_LRA_P hook_bool_void_false
+#define TARGET_LRA_P alpha_use_lra_p
 
 #undef TARGET_LEGITIMATE_ADDRESS_P
 #define TARGET_LEGITIMATE_ADDRESS_P alpha_legitimate_address_p
diff --git a/gcc/config/alpha/alpha.opt b/gcc/config/alpha/alpha.opt
index 62543d2689c..a4d6d58724a 100644
--- a/gcc/config/alpha/alpha.opt
+++ b/gcc/config/alpha/alpha.opt
@@ -89,6 +89,11 @@ mlarge-text
 Target RejectNegative InverseMask(SMALL_TEXT)
 Emit indirect branches to local functions.
 
+mlra
+Target Var(alpha_lra_p) Undocumented
+Usa LRA for reload instead of the old reload framework.  This option is
+experimental, and it may be removed in future versions of the compiler.
+
 mtls-kernel
 Target Mask(TLS_KERNEL)
 Emit rdval instead of rduniq for thread pointer.
diff --git a/gcc/config/alpha/alpha.opt.urls b/gcc/config/alpha/alpha.opt.urls
index a55c08328c3..916a3013f63 100644
--- a/gcc/config/alpha/alpha.opt.urls
+++ b/gcc/config/alpha/alpha.opt.urls
@@ -44,6 +44,8 @@ UrlSuffix(gcc/DEC-Alpha-Options.html#index-msmall-data)
 mlarge-data
 UrlSuffix(gcc/DEC-Alpha-Options.html#index-mlarge-data)
 
+; skipping UrlSuffix for 'mlra' due to finding no URLs
+
 msmall-text
 UrlSuffix(gcc/DEC-Alpha-Options.html#index-msmall-text)
 
-- 
2.39.5



[PATCH] alpha: Add -mlra

2024-10-15 Thread John Paul Adrian Glaubitz
PR target/66207
* config/alpha/alpha.opt (mlra): New target option.
* config/alpha/alpha.cc (alpha_use_lra_p): New function.
(TARGET_LRA_P): Use it.
* config/alpha/alpha.opt.urls: Regenerate.

Signed-off-by: John Paul Adrian Glaubitz 
---
 gcc/config/alpha/alpha.cc   | 10 +-
 gcc/config/alpha/alpha.opt  |  5 +
 gcc/config/alpha/alpha.opt.urls |  2 ++
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/gcc/config/alpha/alpha.cc b/gcc/config/alpha/alpha.cc
index 74631a41693..218c66b6090 100644
--- a/gcc/config/alpha/alpha.cc
+++ b/gcc/config/alpha/alpha.cc
@@ -9941,6 +9941,14 @@ alpha_c_mode_for_floating_type (enum tree_index ti)
   return default_mode_for_floating_type (ti);
 }
 
+/* Implement TARGET_LRA_P.  */
+
+static bool
+alpha_use_lra_p ()
+{
+  return alpha_lra_p;
+}
+
 /* Initialize the GCC target structure.  */
 #if TARGET_ABI_OPEN_VMS
 # undef TARGET_ATTRIBUTE_TABLE
@@ -10124,7 +10132,7 @@ alpha_c_mode_for_floating_type (enum tree_index ti)
 #endif
 
 #undef TARGET_LRA_P
-#define TARGET_LRA_P hook_bool_void_false
+#define TARGET_LRA_P alpha_use_lra_p
 
 #undef TARGET_LEGITIMATE_ADDRESS_P
 #define TARGET_LEGITIMATE_ADDRESS_P alpha_legitimate_address_p
diff --git a/gcc/config/alpha/alpha.opt b/gcc/config/alpha/alpha.opt
index 62543d2689c..a4d6d58724a 100644
--- a/gcc/config/alpha/alpha.opt
+++ b/gcc/config/alpha/alpha.opt
@@ -89,6 +89,11 @@ mlarge-text
 Target RejectNegative InverseMask(SMALL_TEXT)
 Emit indirect branches to local functions.
 
+mlra
+Target Var(alpha_lra_p) Undocumented
+Usa LRA for reload instead of the old reload framework.  This option is
+experimental, and it may be removed in future versions of the compiler.
+
 mtls-kernel
 Target Mask(TLS_KERNEL)
 Emit rdval instead of rduniq for thread pointer.
diff --git a/gcc/config/alpha/alpha.opt.urls b/gcc/config/alpha/alpha.opt.urls
index a55c08328c3..916a3013f63 100644
--- a/gcc/config/alpha/alpha.opt.urls
+++ b/gcc/config/alpha/alpha.opt.urls
@@ -44,6 +44,8 @@ UrlSuffix(gcc/DEC-Alpha-Options.html#index-msmall-data)
 mlarge-data
 UrlSuffix(gcc/DEC-Alpha-Options.html#index-mlarge-data)
 
+; skipping UrlSuffix for 'mlra' due to finding no URLs
+
 msmall-text
 UrlSuffix(gcc/DEC-Alpha-Options.html#index-msmall-text)
 
-- 
2.39.5



Re: [PATCH v2 34/36] arm: [MVE intrinsics] rework vadcq

2024-10-15 Thread Richard Earnshaw
On 14/10/2024 19:18, Richard Earnshaw (lists) wrote:
> On 04/09/2024 14:26, Christophe Lyon wrote:
>> Implement vadcq using the new MVE builtins framework.
>>
>> We re-use most of the code introduced by the previous patch to support
>> vadciq: we just need to initialize carry from the input parameter.
>>
>> 2024-08-28  Christophe Lyon  
>>
>>  gcc/
>>
>>  * config/arm/arm-mve-builtins-base.cc (vadcq_vsbc): Add support
>>  for vadcq.
>>  * config/arm/arm-mve-builtins-base.def (vadcq): New.
>>  * config/arm/arm-mve-builtins-base.h (vadcq): New.
>>  * config/arm/arm_mve.h (vadcq): Delete.
>>  (vadcq_m): Delete.
>>  (vadcq_s32): Delete.
>>  (vadcq_u32): Delete.
>>  (vadcq_m_s32): Delete.
>>  (vadcq_m_u32): Delete.
>>  (__arm_vadcq_s32): Delete.
>>  (__arm_vadcq_u32): Delete.
>>  (__arm_vadcq_m_s32): Delete.
>>  (__arm_vadcq_m_u32): Delete.
>>  (__arm_vadcq): Delete.
>>  (__arm_vadcq_m): Delete.
> 
>> +if (!m_init_carry)
>> +  {
>> +/* Prepare carry in:
>> +   set_fpscr ( (fpscr & ~0x2000u)
>> +   | ((*carry & 1u) << 29) )  */
>> +rtx carry_in = gen_reg_rtx (SImode);
>> +rtx fpscr = gen_reg_rtx (SImode);
>> +emit_insn (gen_get_fpscr_nzcvqc (fpscr));
>> +emit_insn (gen_rtx_SET (carry_in, gen_rtx_MEM (SImode, carry_ptr)));
>> +
>> +emit_insn (gen_rtx_SET (carry_in,
>> +gen_rtx_ASHIFT (SImode,
>> +carry_in,
>> +GEN_INT (29;
>> +emit_insn (gen_rtx_SET (carry_in,
>> +gen_rtx_AND (SImode,
>> + carry_in,
>> + GEN_INT (0x2000;
>> +emit_insn (gen_rtx_SET (fpscr,
>> +gen_rtx_AND (SImode,
>> + fpscr,
>> + GEN_INT (~0x2000;
>> +emit_insn (gen_rtx_SET (carry_in,
>> +gen_rtx_IOR (SImode,
>> + carry_in,
>> + fpscr)));
>> +emit_insn (gen_set_fpscr_nzcvqc (carry_in));
>> +  }
> 
> What's the logic here?  Are we just trying to set the C flag to *carry != 0 
> (is carry a bool?)?  Do we really need to preserve all the other bits in 
> NZCV?  I wouldn't have thought so, suggesting that:
> 
>   CMP *carry, #1  // Set C if *carry != 0
> 
> ought to be enough, without having to generate a read-modify-write sequence.

I realised last night that this is setting up the fpsr not the cpsr, so my 
suggestion won't work.  I am concerned that expanding this too early will leave 
something that we can't optimize away if we have back-to-back vadcq intrinsics 
that chain the carry, but I guess this is no different from what we have 
already.

On that basis, this patch is also OK.  We may need to revisit this sequence 
later to check that we are removing redundant reads + sets.

R.

> 
> R.
> 
>> ---
>>  gcc/config/arm/arm-mve-builtins-base.cc  | 61 +++--
>>  gcc/config/arm/arm-mve-builtins-base.def |  1 +
>>  gcc/config/arm/arm-mve-builtins-base.h   |  1 +
>>  gcc/config/arm/arm_mve.h | 87 
>>  4 files changed, 56 insertions(+), 94 deletions(-)
>>
>> diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
>> b/gcc/config/arm/arm-mve-builtins-base.cc
>> index 6f3b18c2915..9c2e11356ef 100644
>> --- a/gcc/config/arm/arm-mve-builtins-base.cc
>> +++ b/gcc/config/arm/arm-mve-builtins-base.cc
>> @@ -559,10 +559,19 @@ public:
>>  class vadc_vsbc_impl : public function_base
>>  {
>>  public:
>> +  CONSTEXPR vadc_vsbc_impl (bool init_carry)
>> +: m_init_carry (init_carry)
>> +  {}
>> +
>> +  /* Initialize carry with 0 (vadci).  */
>> +  bool m_init_carry;
>> +
>>unsigned int
>>call_properties (const function_instance &) const override
>>{
>>  unsigned int flags = CP_WRITE_MEMORY | CP_READ_FPCR;
>> +if (!m_init_carry)
>> +  flags |= CP_READ_MEMORY;
>>  return flags;
>>}
>>  
>> @@ -605,22 +614,59 @@ public:
>>  carry_ptr = e.args[carry_out_arg_no];
>>  e.args.ordered_remove (carry_out_arg_no);
>>  
>> +if (!m_init_carry)
>> +  {
>> +/* Prepare carry in:
>> +   set_fpscr ( (fpscr & ~0x2000u)
>> +   | ((*carry & 1u) << 29) )  */
>> +rtx carry_in = gen_reg_rtx (SImode);
>> +rtx fpscr = gen_reg_rtx (SImode);
>> +emit_insn (gen_get_fpscr_nzcvqc (fpscr));
>> +emit_insn (gen_rtx_SET (carry_in, gen_rtx_MEM (SImode, carry_ptr)));
>> +
>> +emit_insn (gen_rtx_SET (carry_in,
>> +gen_rtx_ASHIFT (SImode,
>> +carry_in,
>> +GEN_INT (29;
>> +emit_insn (gen_rtx_SET (carry_in,
>> +

[PATCH] match.pd: Further fma negation fixes [PR116891]

2024-10-15 Thread Jakub Jelinek
On Mon, Oct 14, 2024 at 08:53:29AM +0200, Jakub Jelinek wrote:
> > PR middle-end/116891
> > * match.pd ((negate (IFN_FNMS@3 @0 @1 @2)) -> (IFN_FMA @0 @1 @2)):
> > Only enable for !HONOR_SIGN_DEPENDENT_ROUNDING.
> 
> Guess it would be nice to have a testcase which FAILs without the patch and
> PASSes with it, but it can be added later.

I've added such a testcase now, and additionally found the fix only fixed
one of the 4 problematic similar cases.

Here is a patch which fixes the others too and adds the testcases.
fma-pr116891.c FAILed without your patch, FAILs with your patch too (but
only due to the bar/baz/qux checks) and PASSes with the patch.

Ok for trunk if it passes full bootstrap/regtest?

2024-10-15  Jakub Jelinek  

PR middle-end/116891
* match.pd ((negate (fmas@3 @0 @1 @2)) -> (IFN_FNMS @0 @1 @2)):
Only enable for !HONOR_SIGN_DEPENDENT_ROUNDING.
((negate (IFN_FMS@3 @0 @1 @2)) -> (IFN_FNMA @0 @1 @2)): Likewise.
((negate (IFN_FNMA@3 @0 @1 @2)) -> (IFN_FMS @0 @1 @2)): Likewise.

* gcc.dg/pr116891.c: New test.
* gcc.target/i386/fma-pr116891.c: New test.

--- gcc/match.pd.jj 2024-10-15 12:50:47.699905473 +0200
+++ gcc/match.pd2024-10-15 12:57:19.547400416 +0200
@@ -9452,7 +9452,7 @@ (define_operator_list SYNC_FETCH_AND_AND
(IFN_FNMS @0 @1 @2))
   (simplify
(negate (fmas@3 @0 @1 @2))
-   (if (single_use (@3))
+   (if (!HONOR_SIGN_DEPENDENT_ROUNDING (type) && single_use (@3))
 (IFN_FNMS @0 @1 @2
 
  (simplify
@@ -9466,7 +9466,7 @@ (define_operator_list SYNC_FETCH_AND_AND
   (IFN_FNMA @0 @1 @2))
  (simplify
   (negate (IFN_FMS@3 @0 @1 @2))
-   (if (single_use (@3))
+   (if (!HONOR_SIGN_DEPENDENT_ROUNDING (type) && single_use (@3))
 (IFN_FNMA @0 @1 @2)))
 
  (simplify
@@ -9480,7 +9480,7 @@ (define_operator_list SYNC_FETCH_AND_AND
   (IFN_FMS @0 @1 @2))
  (simplify
   (negate (IFN_FNMA@3 @0 @1 @2))
-  (if (single_use (@3))
+  (if (!HONOR_SIGN_DEPENDENT_ROUNDING (type) && single_use (@3))
(IFN_FMS @0 @1 @2)))
 
  (simplify
--- gcc/testsuite/gcc.dg/pr116891.c.jj  2024-10-15 12:31:57.619723244 +0200
+++ gcc/testsuite/gcc.dg/pr116891.c 2024-10-15 12:56:37.809987933 +0200
@@ -0,0 +1,47 @@
+/* PR middle-end/116891 */
+/* { dg-do run } */
+/* { dg-require-effective-target fenv } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-require-effective-target c99_runtime } */
+/* { dg-options "-O2 -frounding-math" } */
+
+#include 
+
+__attribute__((noipa)) double
+foo (double x, double y, double z)
+{
+  return -__builtin_fma (-x, y, -z);
+}
+
+__attribute__((noipa)) double
+bar (double x, double y, double z)
+{
+  return -__builtin_fma (-x, y, z);
+}
+
+__attribute__((noipa)) double
+baz (double x, double y, double z)
+{
+  return -__builtin_fma (x, y, -z);
+}
+
+__attribute__((noipa)) double
+qux (double x, double y, double z)
+{
+  return -__builtin_fma (x, y, z);
+}
+
+int
+main ()
+{
+#if defined (FE_DOWNWARD) && __DBL_MANT_DIG__ == 53 && __DBL_MAX_EXP__ == 1024
+  fesetround (FE_DOWNWARD);
+  double a = foo (-0x1.p256, 0x1.p256, 0x1.p-256);
+  if (a != -__builtin_nextafter (0x1p256 * 0x1p256, 0.))
+__builtin_abort ();
+  if (a != bar (-0x1.p256, 0x1.p256, -0x1.p-256)
+  || a != baz (0x1.p256, 0x1.p256, 0x1.p-256)
+  || a != qux (0x1.p256, 0x1.p256, -0x1.p-256))
+__builtin_abort ();
+#endif
+}
--- gcc/testsuite/gcc.target/i386/fma-pr116891.c.jj 2024-10-15 
12:42:39.711719596 +0200
+++ gcc/testsuite/gcc.target/i386/fma-pr116891.c2024-10-15 
12:44:56.692806834 +0200
@@ -0,0 +1,19 @@
+/* PR middle-end/116891 */
+/* { dg-do run } */
+/* { dg-require-effective-target fenv } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-require-effective-target c99_runtime } */
+/* { dg-require-effective-target fma } */
+/* { dg-options "-O2 -mfma -frounding-math" } */
+
+#include 
+#include "fma-check.h"
+
+#define main() do_main ()
+#include "../../gcc.dg/pr116891.c"
+
+static void
+fma_test (void)
+{
+  do_main ();
+}


Jakub



Re: [PATCH 1/4]middle-end: support multi-step zero-extends using VEC_PERM_EXPR

2024-10-15 Thread Richard Biener
On Tue, 15 Oct 2024, Tamar Christina wrote:

> Hi,
> 
> Thanks for the look,
> 
> The 10/15/2024 09:54, Richard Biener wrote:
> > On Mon, 14 Oct 2024, Tamar Christina wrote:
> > 
> > > Hi All,
> > > 
> > > This patch series adds support for a target to do a direct convertion for 
> > > zero
> > > extends using permutes.
> > > 
> > > To do this it uses a target hook use_permute_for_promotio which must be
> > > implemented by targets.  This hook is used to indicate:
> > > 
> > >  1. can a target do this for the given modes.
> > 
> > can_vec_perm_const_p?
> > 
> > >  3. can the target convert between various vector modes with a 
> > > VIEW_CONVERT.
> > 
> > We have modes_tieable_p for this I think.
> > 
> 
> Yes, though the reason I didn't use either of them was because they are 
> reporting
> a capability of the backend.  In which case the hook, which is already backend
> specific already should answer these two.
> 
> I initially had these checks there, but they didn't seem to add value, for
> promotions the masks are only dependent on the input and output modes. So 
> they really
> don't change.
> 
> When you have say a loop that does lots of conversions from say char to int, 
> it seemed
> like a waste to retest the same permute constants over and over again.
> 
> I can add them back in if you prefer...
> 
> > >  2. is it profitable for the target to do it.
> > 
> > So you say the target can do both ways but both zip and tbl are
> > permute instructions so I really fail to see the point and why
> > the target itself doesn't choose to use tbl for unpack.
> > 
> > Is the intent in the end to have VEC_PERM in the IL rather than
> > VEC_UNPACK_* so it combines with other VEC_PERMs?
> > 
> 
> Yes, and this happens quite often, e.g. load permutes or lane shuffles etc.
> The reason for exposing them as VEC_PERM was to trigger further optimizations.
> 
> If you remember the ticket about LOAD_LANES, with this optimization and an 
> open
> encoding of LOAD_LANES we stop using it in cases where theres a zero extend 
> after
> the LOAD_LANES, because then you're doing effectively two permutes and the 
> LOAD_LANES
> is no longer beneficial. There are other examples, load and replicate etc.
> 
> > That said, I'm not against supporting VEC_PERM code gen from
> > unsigned promotion but I don't see why we should do this when
> > the target advertises VEC_UNPACK_* support or direct conversion
> > support?
> > 
> > Esp. with adding a "local" cost related hook which cannot take
> > into accout context.
> > 
> 
> To summarize a long story:
> 
>   yes I open encode zero extends as permutes to allow further optimizations.  
> One could convert
>   vec_unpacks to convert optabs and use that, but that is an opague value 
> that can't be further
>   optimized.
> 
>   The hook isn't really a costing thing in the general sense. It's literally 
> just "do you want
>   permutes yes or no".  The reason it gets the modes is simply that I don't 
> think a single level
>   extend is worth it, but I can just change it to never try to do this on 
> more than one level.

When you mention LOAD_LANES we do not expose "permutes" in them on GIMPLE
either, so why should we for VEC_UNPACK_*.

At what level are the simplifications you see happening then?

I do realize we have two ways of expressing zero-extending widenings
(also truncations btw) and that's always bad - so we could decide to
_always_ use VEC_PERMs as the canonical representation because those
combine more easily.  And either match VEC_PERMs back to vec_unpack
at RTL expansion time or require targets to expose those as constant
vec_perms as well.  There are targets like GCN where you can't do
unpacking with permutes of course, so we can't do away with them
(we could possibly force those targets to expose widening/truncation
solely with [us]ext and trunc patterns of course).

> I think think there's a lot of merrit in open-encoding zero extends, but one 
> reason this is
> beneficial on AArch64 for instance is that we can consume the zero register 
> and rewrite the
> indices to a single register TBL.  Two registers TBLs are slower on some 
> implementations.

But this latter fact can be done by optimizing the RTL?

Richard.

> Thanks,
> Tamar
> 
> > > Using permutations have a big benefit of multi-step zero extensions 
> > > because they
> > > both reduce the number of needed instructions, but also increase 
> > > throughput as
> > > the dependency chain is removed.
> > > 
> > > Concretely on AArch64 this changes:
> > > 
> > > void test4(unsigned char *x, long long *y, int n) {
> > > for(int i = 0; i < n; i++) {
> > > y[i] = x[i];
> > > }
> > > }
> > > 
> > > from generating:
> > > 
> > > .L4:
> > > ldr q30, [x4], 16
> > > add x3, x3, 128
> > > zip1v1.16b, v30.16b, v31.16b
> > > zip2v30.16b, v30.16b, v31.16b
> > > zip1v2.8h, v1.8h, v31.8h
> > > zip1v0.8h, v30.8h, v31.8h
> > > zip2v1.8h

Re: [PATCH] match.pd: Further fma negation fixes [PR116891]

2024-10-15 Thread Richard Biener
On Tue, 15 Oct 2024, Jakub Jelinek wrote:

> On Mon, Oct 14, 2024 at 08:53:29AM +0200, Jakub Jelinek wrote:
> > >   PR middle-end/116891
> > >   * match.pd ((negate (IFN_FNMS@3 @0 @1 @2)) -> (IFN_FMA @0 @1 @2)):
> > >   Only enable for !HONOR_SIGN_DEPENDENT_ROUNDING.
> > 
> > Guess it would be nice to have a testcase which FAILs without the patch and
> > PASSes with it, but it can be added later.
> 
> I've added such a testcase now, and additionally found the fix only fixed
> one of the 4 problematic similar cases.
> 
> Here is a patch which fixes the others too and adds the testcases.
> fma-pr116891.c FAILed without your patch, FAILs with your patch too (but
> only due to the bar/baz/qux checks) and PASSes with the patch.

Whoops - I did search but for some reasons I was blind ...

> Ok for trunk if it passes full bootstrap/regtest?

OK.

Thanks a lot,
Richard.

> 2024-10-15  Jakub Jelinek  
> 
>   PR middle-end/116891
>   * match.pd ((negate (fmas@3 @0 @1 @2)) -> (IFN_FNMS @0 @1 @2)):
>   Only enable for !HONOR_SIGN_DEPENDENT_ROUNDING.
>   ((negate (IFN_FMS@3 @0 @1 @2)) -> (IFN_FNMA @0 @1 @2)): Likewise.
>   ((negate (IFN_FNMA@3 @0 @1 @2)) -> (IFN_FMS @0 @1 @2)): Likewise.
> 
>   * gcc.dg/pr116891.c: New test.
>   * gcc.target/i386/fma-pr116891.c: New test.
> 
> --- gcc/match.pd.jj   2024-10-15 12:50:47.699905473 +0200
> +++ gcc/match.pd  2024-10-15 12:57:19.547400416 +0200
> @@ -9452,7 +9452,7 @@ (define_operator_list SYNC_FETCH_AND_AND
> (IFN_FNMS @0 @1 @2))
>(simplify
> (negate (fmas@3 @0 @1 @2))
> -   (if (single_use (@3))
> +   (if (!HONOR_SIGN_DEPENDENT_ROUNDING (type) && single_use (@3))
>  (IFN_FNMS @0 @1 @2
>  
>   (simplify
> @@ -9466,7 +9466,7 @@ (define_operator_list SYNC_FETCH_AND_AND
>(IFN_FNMA @0 @1 @2))
>   (simplify
>(negate (IFN_FMS@3 @0 @1 @2))
> -   (if (single_use (@3))
> +   (if (!HONOR_SIGN_DEPENDENT_ROUNDING (type) && single_use (@3))
>  (IFN_FNMA @0 @1 @2)))
>  
>   (simplify
> @@ -9480,7 +9480,7 @@ (define_operator_list SYNC_FETCH_AND_AND
>(IFN_FMS @0 @1 @2))
>   (simplify
>(negate (IFN_FNMA@3 @0 @1 @2))
> -  (if (single_use (@3))
> +  (if (!HONOR_SIGN_DEPENDENT_ROUNDING (type) && single_use (@3))
> (IFN_FMS @0 @1 @2)))
>  
>   (simplify
> --- gcc/testsuite/gcc.dg/pr116891.c.jj2024-10-15 12:31:57.619723244 
> +0200
> +++ gcc/testsuite/gcc.dg/pr116891.c   2024-10-15 12:56:37.809987933 +0200
> @@ -0,0 +1,47 @@
> +/* PR middle-end/116891 */
> +/* { dg-do run } */
> +/* { dg-require-effective-target fenv } */
> +/* { dg-require-effective-target hard_float } */
> +/* { dg-require-effective-target c99_runtime } */
> +/* { dg-options "-O2 -frounding-math" } */
> +
> +#include 
> +
> +__attribute__((noipa)) double
> +foo (double x, double y, double z)
> +{
> +  return -__builtin_fma (-x, y, -z);
> +}
> +
> +__attribute__((noipa)) double
> +bar (double x, double y, double z)
> +{
> +  return -__builtin_fma (-x, y, z);
> +}
> +
> +__attribute__((noipa)) double
> +baz (double x, double y, double z)
> +{
> +  return -__builtin_fma (x, y, -z);
> +}
> +
> +__attribute__((noipa)) double
> +qux (double x, double y, double z)
> +{
> +  return -__builtin_fma (x, y, z);
> +}
> +
> +int
> +main ()
> +{
> +#if defined (FE_DOWNWARD) && __DBL_MANT_DIG__ == 53 && __DBL_MAX_EXP__ == 
> 1024
> +  fesetround (FE_DOWNWARD);
> +  double a = foo (-0x1.p256, 0x1.p256, 0x1.p-256);
> +  if (a != -__builtin_nextafter (0x1p256 * 0x1p256, 0.))
> +__builtin_abort ();
> +  if (a != bar (-0x1.p256, 0x1.p256, -0x1.p-256)
> +  || a != baz (0x1.p256, 0x1.p256, 0x1.p-256)
> +  || a != qux (0x1.p256, 0x1.p256, -0x1.p-256))
> +__builtin_abort ();
> +#endif
> +}
> --- gcc/testsuite/gcc.target/i386/fma-pr116891.c.jj   2024-10-15 
> 12:42:39.711719596 +0200
> +++ gcc/testsuite/gcc.target/i386/fma-pr116891.c  2024-10-15 
> 12:44:56.692806834 +0200
> @@ -0,0 +1,19 @@
> +/* PR middle-end/116891 */
> +/* { dg-do run } */
> +/* { dg-require-effective-target fenv } */
> +/* { dg-require-effective-target hard_float } */
> +/* { dg-require-effective-target c99_runtime } */
> +/* { dg-require-effective-target fma } */
> +/* { dg-options "-O2 -mfma -frounding-math" } */
> +
> +#include 
> +#include "fma-check.h"
> +
> +#define main() do_main ()
> +#include "../../gcc.dg/pr116891.c"
> +
> +static void
> +fma_test (void)
> +{
> +  do_main ();
> +}
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


RE: [PATCH 1/4]middle-end: support multi-step zero-extends using VEC_PERM_EXPR

2024-10-15 Thread Richard Biener
On Tue, 15 Oct 2024, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Tuesday, October 15, 2024 12:13 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd 
> > Subject: Re: [PATCH 1/4]middle-end: support multi-step zero-extends using
> > VEC_PERM_EXPR
> > 
> > On Tue, 15 Oct 2024, Tamar Christina wrote:
> > 
> > > Hi,
> > >
> > > Thanks for the look,
> > >
> > > The 10/15/2024 09:54, Richard Biener wrote:
> > > > On Mon, 14 Oct 2024, Tamar Christina wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > This patch series adds support for a target to do a direct convertion 
> > > > > for zero
> > > > > extends using permutes.
> > > > >
> > > > > To do this it uses a target hook use_permute_for_promotio which must 
> > > > > be
> > > > > implemented by targets.  This hook is used to indicate:
> > > > >
> > > > >  1. can a target do this for the given modes.
> > > >
> > > > can_vec_perm_const_p?
> > > >
> > > > >  3. can the target convert between various vector modes with a
> > VIEW_CONVERT.
> > > >
> > > > We have modes_tieable_p for this I think.
> > > >
> > >
> > > Yes, though the reason I didn't use either of them was because they are 
> > > reporting
> > > a capability of the backend.  In which case the hook, which is already 
> > > backend
> > > specific already should answer these two.
> > >
> > > I initially had these checks there, but they didn't seem to add value, for
> > > promotions the masks are only dependent on the input and output modes. So
> > they really
> > > don't change.
> > >
> > > When you have say a loop that does lots of conversions from say char to 
> > > int, it
> > seemed
> > > like a waste to retest the same permute constants over and over again.
> > >
> > > I can add them back in if you prefer...
> > >
> > > > >  2. is it profitable for the target to do it.
> > > >
> > > > So you say the target can do both ways but both zip and tbl are
> > > > permute instructions so I really fail to see the point and why
> > > > the target itself doesn't choose to use tbl for unpack.
> > > >
> > > > Is the intent in the end to have VEC_PERM in the IL rather than
> > > > VEC_UNPACK_* so it combines with other VEC_PERMs?
> > > >
> > >
> > > Yes, and this happens quite often, e.g. load permutes or lane shuffles 
> > > etc.
> > > The reason for exposing them as VEC_PERM was to trigger further 
> > > optimizations.
> > >
> > > If you remember the ticket about LOAD_LANES, with this optimization and an
> > open
> > > encoding of LOAD_LANES we stop using it in cases where theres a zero 
> > > extend
> > after
> > > the LOAD_LANES, because then you're doing effectively two permutes and the
> > LOAD_LANES
> > > is no longer beneficial. There are other examples, load and replicate etc.
> > >
> > > > That said, I'm not against supporting VEC_PERM code gen from
> > > > unsigned promotion but I don't see why we should do this when
> > > > the target advertises VEC_UNPACK_* support or direct conversion
> > > > support?
> > > >
> > > > Esp. with adding a "local" cost related hook which cannot take
> > > > into accout context.
> > > >
> > >
> > > To summarize a long story:
> > >
> > >   yes I open encode zero extends as permutes to allow further 
> > > optimizations.
> > One could convert
> > >   vec_unpacks to convert optabs and use that, but that is an opague value 
> > > that
> > can't be further
> > >   optimized.
> > >
> > >   The hook isn't really a costing thing in the general sense. It's 
> > > literally just "do you
> > want
> > >   permutes yes or no".  The reason it gets the modes is simply that I 
> > > don't think a
> > single level
> > >   extend is worth it, but I can just change it to never try to do this on 
> > > more than
> > one level.
> > 
> > When you mention LOAD_LANES we do not expose "permutes" in them on
> > GIMPLE
> > either, so why should we for VEC_UNPACK_*.
> 
> I think not exposing LOAD_LANES in GIMPLE *is* an actual mistake that I hope 
> to correct in GCC-16.
> Or at least the time we pick LOAD_LANES is too early.  So I don't think 
> pointing to this is a convincing
> argument.  It's only VLA that I think needs the IL because you have to mask 
> the group of operations and
> may be hard to reconcile that later on.
> 
> > At what level are the simplifications you see happening then?
> 
> Well, they are currently happening outside of the vectorizer passes itself,
> more specifically in this case because VN runs match simplifications.

But match doesn't simplify permutes against .LOAD_LANES?  So it's about
"other" permutes (from loads) that get simplified?

> If the concern is that that's late I can lift it to a pattern I suppose.
> I didn't use a pattern because similar changes in this area always just 
> happened
> at codegen.

I was wondering how this plays with my idea of having us "lower"
or rather "code generate" to an intermediate SLP representation where
we split SLP groups on vector boundaries and are then free to
perform

Re: [PATCH 2/4]middle-end: Fix VEC_PERM_EXPR lowering since relaxation of vector sizes

2024-10-15 Thread Richard Biener
On Mon, 14 Oct 2024, Tamar Christina wrote:

> Hi All,
> 
> In GCC 14 VEC_PERM_EXPR was relaxed to be able to permute to a 2x larger 
> vector
> than the size of the input vectors.  However various passes and 
> transformations
> were not updated to account for this.
> 
> I have patches in these area that I will be upstreaming with individual 
> patches
> that expose them.
> 
> This one is that vectlower tries to lower based on the size of the input 
> vectors
> rather than the size of the output.  As a consequence it creates an invalid
> vector of half the size.
> 
> Luckily we ICE because the resulting nunits doesn't match the vector size.
> 
> Tests in the AArch64 patch test for this behaviour.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf,
> x86_64-pc-linux-gnu -m32, -m64 and no issues.
> 
> Ok for master?

OK.

Do you have a testcase btw?

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-generic.cc (lower_vec_perm): Use output vector size instead
>   of input vector when determining output nunits.
> 
> ---
> diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
> index 
> 3041fb8fcf235ba86f37ef73aa089330a2fd0b77..f86f7eabb255fde50b30fa3b85db367df930f321
>  100644
> --- a/gcc/tree-vect-generic.cc
> +++ b/gcc/tree-vect-generic.cc
> @@ -1500,6 +1500,7 @@ lower_vec_perm (gimple_stmt_iterator *gsi)
>tree mask = gimple_assign_rhs3 (stmt);
>tree vec0 = gimple_assign_rhs1 (stmt);
>tree vec1 = gimple_assign_rhs2 (stmt);
> +  tree res_vect_type = TREE_TYPE (gimple_assign_lhs (stmt));
>tree vect_type = TREE_TYPE (vec0);
>tree mask_type = TREE_TYPE (mask);
>tree vect_elt_type = TREE_TYPE (vect_type);
> @@ -1512,7 +1513,7 @@ lower_vec_perm (gimple_stmt_iterator *gsi)
>location_t loc = gimple_location (gsi_stmt (*gsi));
>unsigned i;
>  
> -  if (!TYPE_VECTOR_SUBPARTS (vect_type).is_constant (&elements))
> +  if (!TYPE_VECTOR_SUBPARTS (res_vect_type).is_constant (&elements))
>  return;
>  
>if (TREE_CODE (mask) == SSA_NAME)
> @@ -1672,9 +1673,9 @@ lower_vec_perm (gimple_stmt_iterator *gsi)
>  }
>  
>if (constant_p)
> -constr = build_vector_from_ctor (vect_type, v);
> +constr = build_vector_from_ctor (res_vect_type, v);
>else
> -constr = build_constructor (vect_type, v);
> +constr = build_constructor (res_vect_type, v);
>gimple_assign_set_rhs_from_tree (gsi, constr);
>update_stmt (gsi_stmt (*gsi));
>  }
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH v2] alpha: Add -mlra option

2024-10-15 Thread Maciej W. Rozycki
On Tue, 15 Oct 2024, Uros Bizjak wrote:

> > PR target/66207
> > * config/alpha/alpha.opt (mlra): New target option.
> > * config/alpha/alpha.cc (alpha_use_lra_p): New function.
> > (TARGET_LRA_P): Use it.
> > * config/alpha/alpha.opt.urls: Regenerate.
> 
> IMO, we should simply deprecate non-BWX targets. If reload is going
> away, then there is no way for non-BWX targets to access reload
> internals they require for compilation. As mentioned in the PR,
> non-BWX targets are removed from distros anyway, so I guess there is
> no point to invest much time to modernize them,

 Well, I have a lasting desire to keep non-BWX Alphas running, under Linux 
in particular, and I'm going to look into any issues around it; reload vs 
LRA is all software, so things can always be sorted one way or another.

 While I've been distracted by other matters lately, such as hardware 
failures that had to be dealt with urgently, this is now my priority #1 
and I do hope to have at least some critical initial stuff in with this 
release cycle (noting that only ~5 weeks have left).

 NB I spoke to Richard about it while at LPC 2024 recently.

  Maciej


RE: [PATCH 4/4]middle-end: create the longest possible zero extend chain after overwidening

2024-10-15 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, October 15, 2024 1:42 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd 
> Subject: Re: [PATCH 4/4]middle-end: create the longest possible zero extend 
> chain
> after overwidening
> 
> On Mon, 14 Oct 2024, Tamar Christina wrote:
> 
> > Hi All,
> >
> > Consider loops such as:
> >
> > void test9(unsigned char *x, long long *y, int n, unsigned char k) {
> > for(int i = 0; i < n; i++) {
> > y[i] = k + x[i];
> > }
> > }
> >
> > where today we generate:
> >
> > .L5:
> > ldr q29, [x5], 16
> > add x4, x4, 128
> > uaddl   v1.8h, v29.8b, v30.8b
> > uaddl2  v29.8h, v29.16b, v30.16b
> > zip1v2.8h, v1.8h, v31.8h
> > zip1v0.8h, v29.8h, v31.8h
> > zip2v1.8h, v1.8h, v31.8h
> > zip2v29.8h, v29.8h, v31.8h
> > sxtlv25.2d, v2.2s
> > sxtlv28.2d, v0.2s
> > sxtlv27.2d, v1.2s
> > sxtlv26.2d, v29.2s
> > sxtl2   v2.2d, v2.4s
> > sxtl2   v0.2d, v0.4s
> > sxtl2   v1.2d, v1.4s
> > sxtl2   v29.2d, v29.4s
> > stp q25, q2, [x4, -128]
> > stp q27, q1, [x4, -96]
> > stp q28, q0, [x4, -64]
> > stp q26, q29, [x4, -32]
> > cmp x5, x6
> > bne .L5
> >
> > Note how the zero extend from short to long is half way the chain 
> > transformed
> > into a sign extend.  There are two problems with this:
> >
> >   1. sign extends are typically slower than zero extends on many uArches.
> >   2. it prevents vectorizable_conversion from attempting to do a single step
> >  promotion.
> >
> > These sign extend happen due to the varous range reduction optimizations and
> > patterns we have, such as multiplication widening, etc.
> >
> > My first attempt to fix this was just updating the patterns to when the 
> > original
> > source is a zero extend, to not add the intermediate sign extend.
> >
> > However this behavior happens in many other places, some of it and as new
> > patterns get added the problem can be re-introduced.
> >
> > Instead I have added a new pattern vect_recog_zero_extend_chain_pattern that
> > attempts to simplify and extend an existing zero extend over multiple
> > conversions statements.
> >
> > As an example, T3 a = (T3)(signed T2)(unsigned T1)x where bitsize T3 > T2 > 
> > T1
> > gets transformed into T3 a = (T3)(signed T2)(unsigned T2)x.
> >
> > The final cast to signed it kept so the types in the tree still match. It 
> > will
> > be correctly elided later on.
> >
> > This represenation is the most optimal as vectorizable_conversion is already
> > able to decompose a long promotion into multiple steps if the target does 
> > not
> > support it in a single step.  More importantly it allows us to do proper 
> > costing
> > and support such conversions like (double)x, where bitsize(x) < int in an
> > efficient manner.
> >
> > To do this I have used Ranger's on-demand analysis to perform the check to 
> > see
> > if an extension can be removed and extended to zero extend.  The reason for 
> > this
> > is that the vectorizer introduces several patterns that are not in the IL,  
> > but
> > also lots of widening IFNs for which handling in a switch wouldn't be very
> > future proof.
> >
> > I did try to do it without Ranger, but ranger had two benefits:
> >
> > 1.  It simplified the handling of the IL changes the vectorizer introduces, 
> > and
> > makes it future proof.
> > 2.  Ranger has the advantage of doing the transformation in cases where it
> knows
> > that the top bits of the value is zero.  Which we wouldn't be able to 
> > tell
> > by looking purely at statements.
> > 3.  Ranger simplified the handling of corner cases.  Without it the 
> > handling was
> > quite complex and I wasn't very confident in it's correctness.
> >
> > So I think ranger is the right way to go here...  With these changes the 
> > above
> > now generates:
> >
> > .L5:
> > add x4, x4, 128
> > ldr q26, [x5], 16
> > uaddl   v2.8h, v26.8b, v31.8b
> > uaddl2  v26.8h, v26.16b, v31.16b
> > tbl v4.16b, {v2.16b}, v30.16b
> > tbl v3.16b, {v2.16b}, v29.16b
> > tbl v24.16b, {v2.16b}, v28.16b
> > tbl v1.16b, {v26.16b}, v30.16b
> > tbl v0.16b, {v26.16b}, v29.16b
> > tbl v25.16b, {v26.16b}, v28.16b
> > tbl v2.16b, {v2.16b}, v27.16b
> > tbl v26.16b, {v26.16b}, v27.16b
> > stp q4, q3, [x4, -128]
> > stp q1, q0, [x4, -64]
> > stp q24, q2, [x4, -96]
> > stp q25, q26, [x4, -32]
> > cmp x5, x6
> > bne .L5
> >
> > I have also seen similar improvements in codegen on Arm and x86_64, 
> > especially
> > with AVX512.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf,
> > x86_64-pc-linux-gnu -m32, -m

Re: [PATCH] c++: Restore rust front-end build [PR117114]

2024-10-15 Thread Jason Merrill

On 10/13/24 7:55 AM, Simon Martin wrote:

The patch that I merged via r15-4282-g60163c85730e6b breaks the build
for the rust front-end because it does not work well when virtual
inheritance is in play.

The problem is that in such a case, an overrider and its overridden base
method might have a different DECL_VINDEX, and the derived method would
be incorrectly considered as hiding the base one.

This patch fixes this by not comparing the DECL_VINDEX anymore, but
rather going back to comparing the signatures, only after having
excluded conversion operators to different types.


Incidentally, it seems I was wrong to say you can just compare 
DECL_NAME: the name ends up being different in case of typedefs like in 
inherit/virtual14.C, so we do need to compare the type after all.


Jason



Re: [PATCH v2] alpha: Add -mlra option

2024-10-15 Thread Oleg Endo
On Tue, 2024-10-15 at 15:06 +0200, John Paul Adrian Glaubitz wrote:

> On Tue, 2024-10-15 at 14:47 +0200, Richard Biener wrote:
> > If you can provide -mlra vs. -mno-lra testsuite results as well that
> > would be interesting.
> 
> OK, I'll try to provide these.
> 
> > Does "just work" mean you can build the compiler and its target
> > libraries?
> 
> I'm performing a full native bootstrap build with all languages except D,
> Go and Rust enabled. This failed with an ICE with the default baseline
> at some point, but continues to build for a while now with --with-cpu=ev56.

Bootstrap catches only some issues, not all.  E.g. see current SH LRA status
(bootstrap OK, still produces silent wrong-code and other issues).  Hence
the request to also run the testsuite (make -k check) and diff the .sum
files.

Best regards,
Oleg Endo


Re: [PATCH] c++: unifying lvalue vs rvalue (non-forwarding) ref [PR116710]

2024-10-15 Thread Jason Merrill

On 10/15/24 12:47 AM, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?


OK.


-- >8 --

When unifying two (non-forwarding) reference types, unify immediately
recurses into the reference type without first comparing rvalueness.
(Note that at this point forwarding references have already been
collapsed into non-references by maybe_adjust_types_for_deduction.)

gcc/cp/ChangeLog:

* pt.cc (unify) : Compare rvalueness.

gcc/testsuite/ChangeLog:

* g++.dg/template/unify12.C: New test.
---
  gcc/cp/pt.cc|  3 ++-
  gcc/testsuite/g++.dg/template/unify12.C | 24 
  2 files changed, 26 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/template/unify12.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index af784a41265..c7cbf6df26c 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -25154,7 +25154,8 @@ unify (tree tparms, tree targs, tree parm, tree arg, 
int strict,
}
  
  case REFERENCE_TYPE:

-  if (!TYPE_REF_P (arg))
+  if (!TYPE_REF_P (arg)
+ || TYPE_REF_IS_RVALUE (parm) != TYPE_REF_IS_RVALUE (arg))
return unify_type_mismatch (explain_p, parm, arg);
return unify (tparms, targs, TREE_TYPE (parm), TREE_TYPE (arg),
strict & UNIFY_ALLOW_MORE_CV_QUAL, explain_p);
diff --git a/gcc/testsuite/g++.dg/template/unify12.C 
b/gcc/testsuite/g++.dg/template/unify12.C
new file mode 100644
index 000..84e4efb4cd9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/unify12.C
@@ -0,0 +1,24 @@
+// PR c++/116710
+// { dg-do compile { target c++11 } }
+
+template  struct A : T {};
+
+template 
+void f(void (*)(T &), typename A::type * = 0);
+
+void f(...);
+
+void g(int &&);
+
+void q() { f(g); } // OK
+
+template
+struct B { operator B(); };
+
+template
+void h(B);
+
+int main() {
+  B b;
+  h(b); // { dg-error "no match" }
+}




Re: [PATCH] c++: checking ICE w/ lambda targ inside constexpr if [PR117054]

2024-10-15 Thread Jason Merrill

On 10/15/24 12:48 AM, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk?


OK.


-- >8 --

Here we're tripping over the assert in extract_locals_r which enforces
that an extra-args tree appearing inside another extra-args tree doesn't
actually have extra args.  This invariant no longer always holds for
lambdas (which recently gained the extra-args mechanism), but it should
be safe to just disable the assert for them since cp_walk_subtrees doesn't
walk LAMBDA_EXPR_EXTRA_ARGS and so should be immune to the PR114303 issue
for now.

PR c++/117054

gcc/cp/ChangeLog:

* pt.cc (extract_locals_r): Disable tree_extra_args assert
for LAMBDA_EXPR.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-targ9.C: New test.
---
  gcc/cp/pt.cc  |  7 ++-
  gcc/testsuite/g++.dg/cpp2a/lambda-targ9.C | 16 
  2 files changed, 22 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ9.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index c7cbf6df26c..b90447ae6a2 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -13479,7 +13479,12 @@ extract_locals_r (tree *tp, int *walk_subtrees, void 
*data_)
 outermost tree.  Nested *_EXTRA_ARGS should naturally be empty since
 the outermost (extra-args) tree will intercept any substitution before
 a nested tree can.  */
-gcc_checking_assert (tree_extra_args (*tp) == NULL_TREE);
+gcc_checking_assert (tree_extra_args (*tp) == NULL_TREE
+   /* Except a lambda nested inside an extra-args tree
+  can have extra args if we deferred partial
+  substitution into it at template parse time.  But
+  we don't walk LAMBDA_EXPR_EXTRA_ARGS anyway.  */
+|| TREE_CODE (*tp) == LAMBDA_EXPR);
  
if (TREE_CODE (*tp) == DECL_EXPR)

  {
diff --git a/gcc/testsuite/g++.dg/cpp2a/lambda-targ9.C 
b/gcc/testsuite/g++.dg/cpp2a/lambda-targ9.C
new file mode 100644
index 000..00dd4b2406e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/lambda-targ9.C
@@ -0,0 +1,16 @@
+// PR c++/117054
+// { dg-do compile { target c++20 } }
+
+template
+constexpr bool a = true;
+
+template
+void f() {
+  [](auto) {
+if constexpr (a<>) {}
+  };
+}
+
+int main() {
+  f();
+}




Re: [PATCH] c++: Restore rust front-end build [PR117114]

2024-10-15 Thread Simon Martin
Hi Jason,

On 15 Oct 2024, at 15:18, Jason Merrill wrote:

> On 10/13/24 7:55 AM, Simon Martin wrote:
>> The patch that I merged via r15-4282-g60163c85730e6b breaks the build
>> for the rust front-end because it does not work well when virtual
>> inheritance is in play.
>>
>> The problem is that in such a case, an overrider and its overridden 
>> base
>> method might have a different DECL_VINDEX, and the derived method 
>> would
>> be incorrectly considered as hiding the base one.
>>
>> This patch fixes this by not comparing the DECL_VINDEX anymore, but
>> rather going back to comparing the signatures, only after having
>> excluded conversion operators to different types.
>
> Incidentally, it seems I was wrong to say you can just compare 
> DECL_NAME: the name ends up being different in case of typedefs like 
> in inherit/virtual14.C, so we do need to compare the type after all.
Thanks for calling this out. I’ll integrate such a case in my next 
iteration.

Simon



Re: [PATCH v2] alpha: Add -mlra option

2024-10-15 Thread Maciej W. Rozycki
On Tue, 15 Oct 2024, Richard Biener wrote:

> > FWIW, it *seems* that LRA seems to just work with EV56 as the baseline and 
> > the
> > following replacements in the code:
> >
> > s/reload_in_progress/reload_in_progress || lra_in_progress/g
> 
> If you can provide -mlra vs. -mno-lra testsuite results as well that
> would be interesting.
> 
> Does "just work" mean you can build the compiler and its target
> libraries?  In this case
> I would suggest to go further and pull the trigger now, defaulting to
> LRA but allowing
> to switch back to reload for testing.  This is so the few people
> testing alpha at all can
> increase testing coverage - I don't think anybody runs older than EV5 HW.

 Well, I did run EV4 testing with real hardware recently.  As an example 
here's the summary of results for the C frontend only:

=== gcc Summary ===

# of expected passes149119
# of unexpected failures134
# of expected failures  1117
# of unresolved testcases   3
# of unsupported tests  3176

(I ran testing across the board).  If non-BWX is known to be broken with 
LRA, then perhaps let's only make the switch for BWX for the time being?

  Maciej


Re: [PATCH 1/5] arm: [MVE intrinsics] fix vst tests

2024-10-15 Thread Richard Earnshaw (lists)
On 16/09/2024 10:38, Christophe Lyon wrote:
> From: Alfie Richards 
> 
> The tests for vst* instrinsics use functions which return a void
> expression which can generate a warning. This hasn't come up previously
> as the inlining presumably prevents the warning.
> 
> This change removed the uneccessary and incorrect returns.
> 
> 2024-09-11  Alfie Richards 
> 
>   gcc/testsuite/
>   * gcc.target/arm/mve/intrinsics/vst1q_p_f16.c: Remove `return`.
>   * gcc.target/arm/mve/intrinsics/vst1q_p_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vst1q_p_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vst1q_p_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vst1q_p_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vst1q_p_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vst1q_p_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vst1q_p_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vst2q_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vst2q_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vst2q_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vst2q_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vst2q_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vst2q_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vst2q_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vst2q_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vst4q_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vst4q_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vst4q_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vst4q_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vst4q_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vst4q_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vst4q_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vst4q_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_p_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_p_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_p_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_p_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_p_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_p_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s16.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s32.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s8.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u16.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u32.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u8.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s16.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s32.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s8.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u16.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u32.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u8.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_p_s64.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_p_u64.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_s64.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_u64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_p_s64.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_p_u64.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_s64.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_u64.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrdq_scatter_offset_p_s64.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrdq_scatter_offset_p_u64.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrdq_scatter_offset_s64.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrdq_scatter_offset_u64.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrdq_scatter_shifted_offset_p_s64.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrdq_scatter_shifted_offset_p_u64.c:
>   Likewise.
>   * gcc.targ

Re: [PATCH 2/5] arm: [MVE intrinsics] Add load_ext intrinsic shape

2024-10-15 Thread Richard Earnshaw (lists)
On 16/09/2024 10:38, Christophe Lyon wrote:
> From: Alfie Richards 
> 
> This patch adds the extending load shape.
> It also adds/fixes comments for the load and store shapes.
> 
> 2024-09-11  Alfie Richards 
>   Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm-mve-builtins-shapes.cc:
>   (load_ext): New.
>   * config/arm/arm-mve-builtins-shapes.h:
>   (load_ext): New.

OK.

R.

> ---
>  gcc/config/arm/arm-mve-builtins-shapes.cc | 30 ---
>  gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
>  2 files changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
> b/gcc/config/arm/arm-mve-builtins-shapes.cc
> index ba20c6a8f73..1783fcf4c31 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -1428,7 +1428,9 @@ struct inherent_def : public nonoverloaded_base
>  };
>  SHAPE (inherent)
>  
> -/* sv_t svfoo[_t0](const _t *)
> +/* _t vfoo[_t0](const _t *)
> +
> +   where  is the scalar name of .
>  
> Example: vld1q.
> int8x16_t [__arm_]vld1q[_s8](int8_t const *base)
> @@ -1460,6 +1462,24 @@ struct load_def : public overloaded_base<0>
>  };
>  SHAPE (load)
>  
> +/* _t foo_t0 (const _t *)
> +
> +   where  is determined by the function base name.
> +
> +   Example: vldrq.
> +   int32x4_t [__arm_]vldrwq_s32 (int32_t const *base)
> +   uint32x4_t [__arm_]vldrhq_z_u32 (uint16_t const *base, mve_pred16_t p)  */
> +struct load_ext_def : public nonoverloaded_base
> +{
> +  void
> +  build (function_builder &b, const function_group_info &group,
> +  bool preserve_user_namespace) const override
> +  {
> +build_all (b, "t0,al", group, MODE_none, preserve_user_namespace);
> +  }
> +};
> +SHAPE (load_ext)
> +
>  /* _t vfoo[_t0](_t)
> _t vfoo_n_t0(_t)
>  
> @@ -1509,14 +1529,18 @@ struct mvn_def : public overloaded_base<0>
>  };
>  SHAPE (mvn)
>  
> -/* void vfoo[_t0](_t *, v[xN]_t)
> +/* void vfoo[_t0](_t *, [xN]_t)
>  
> where  might be tied to  (for non-truncating stores) or might
> depend on the function base name (for truncating stores).
>  
> Example: vst1q.
> void [__arm_]vst1q[_s8](int8_t *base, int8x16_t value)
> -   void [__arm_]vst1q_p[_s8](int8_t *base, int8x16_t value, mve_pred16_t p)  
> */
> +   void [__arm_]vst1q_p[_s8](int8_t *base, int8x16_t value, mve_pred16_t p)
> +
> +   Example: vstrb.
> +   void [__arm_]vstrbq[_s16](int8_t *base, int16x8_t value)
> +   void [__arm_]vstrbq_p[_s16](int8_t *base, int16x8_t value, mve_pred16_t 
> p)  */
>  struct store_def : public overloaded_base<0>
>  {
>void
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
> b/gcc/config/arm/arm-mve-builtins-shapes.h
> index 61aa4fa73b3..45ed27ec920 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.h
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.h
> @@ -62,6 +62,7 @@ namespace arm_mve
>  extern const function_shape *const create;
>  extern const function_shape *const inherent;
>  extern const function_shape *const load;
> +extern const function_shape *const load_ext;
>  extern const function_shape *const mvn;
>  extern const function_shape *const store;
>  extern const function_shape *const ternary;



[PATCH] tree-optimization/117147 - add testcase

2024-10-15 Thread Richard Biener
The following adds a testcase for the PR.

Pushed.

PR tree-optimization/117147
* gcc.dg/vect/pr117147.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr117147.c | 19 +++
 1 file changed, 19 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr117147.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr117147.c 
b/gcc/testsuite/gcc.dg/vect/pr117147.c
new file mode 100644
index 000..bc20fa8741b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr117147.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-mavx512f" { target { x86_64-*-* i?86-*-* } } } */
+
+double Test(int p, double *sum3, double *sum4, double *wX)
+{
+  double tmp;
+  double bS1 = 0.;
+  double bS2 = 0.;
+  for (int i = 0; i < p; ++i)
+{
+  tmp = wX[i] * wX[i];
+  if (tmp != 0.0)
+   {
+ bS1 += sum3[i] * sum3[i] / (tmp * wX[i]);
+ bS2 += sum4[i] / tmp;
+   }
+}
+  return (bS2 + bS1);
+}
-- 
2.43.0


Re: [PATCH] libstdc++: Implement LWG 3798 for range adaptors [PR106676]

2024-10-15 Thread Jonathan Wakely
On Tue, 15 Oct 2024 at 14:30, Patrick Palka  wrote:
>
> On Mon, 14 Oct 2024, Jonathan Wakely wrote:
>
> > Tested x86_64-linux.
> >
> > -- >8 --
> >
> > LWG 3798 modified the iterator_category of the iterator types for
> > transform_view, join_with_view, zip_transform_view and
> > adjacent_transform_view, to allow the iterator's reference type to be an
> > rvalue reference.
> >
> > libstdc++-v3/ChangeLog:
> >
> >   PR libstdc++/106676
> >   * include/bits/iterator_concepts.h (__cpp17_fwd_iterator): Use
> >   is_reference instead of is_value_reference.
> >   rvalue references.
> >   * include/std/ranges (transform_view:__iter_cat::_S_iter_cat):
> >   Likewise.
> >   (zip_transform_view::__iter_cat::_S_iter_cat): Likewise.
> >   (adjacent_transform_view::__iter_cat::_S_iter_cat): Likewise.
> >   (join_with_view::__iter_cat::_S_iter_cat): Likewise.
> >   * testsuite/std/ranges/adaptors/transform.cc: Check
> >   iterator_category when the transformation function returns an
> >   rvalue reference type.
> > ---
> >  libstdc++-v3/include/bits/iterator_concepts.h|  4 +++-
> >  libstdc++-v3/include/std/ranges  | 16 
> >  .../testsuite/std/ranges/adaptors/transform.cc   | 16 
> >  3 files changed, 31 insertions(+), 5 deletions(-)
> >
> > diff --git a/libstdc++-v3/include/bits/iterator_concepts.h 
> > b/libstdc++-v3/include/bits/iterator_concepts.h
> > index 490a362cdf1..669d3ddfd1e 100644
> > --- a/libstdc++-v3/include/bits/iterator_concepts.h
> > +++ b/libstdc++-v3/include/bits/iterator_concepts.h
> > @@ -333,10 +333,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >   typename incrementable_traits<_Iter>::difference_type>;
> >   };
> >
> > +// _GLIBCXX_RESOLVE_LIB_DEFECTS
> > +// 3798. Rvalue reference and iterator_category
> >  template
> >concept __cpp17_fwd_iterator = __cpp17_input_iterator<_Iter>
> >   && constructible_from<_Iter>
> > - && is_lvalue_reference_v>
> > + && is_reference_v>
> >   && same_as>,
> >  typename indirectly_readable_traits<_Iter>::value_type>
> >   && requires(_Iter __it)
> > diff --git a/libstdc++-v3/include/std/ranges 
> > b/libstdc++-v3/include/std/ranges
> > index f0d81cbea0c..941189d65c3 100644
> > --- a/libstdc++-v3/include/std/ranges
> > +++ b/libstdc++-v3/include/std/ranges
> > @@ -1892,7 +1892,9 @@ namespace views::__adaptor
> >   using _Fpc = __detail::__maybe_const_t<_Const, _Fp>;
> >   using _Base = transform_view::_Base<_Const>;
> >   using _Res = invoke_result_t<_Fpc&, range_reference_t<_Base>>;
> > - if constexpr (is_lvalue_reference_v<_Res>)
> > + // _GLIBCXX_RESOLVE_LIB_DEFECTS
> > + // 3798. Rvalue reference and iterator_category
> > + if constexpr (is_reference_v<_Res>)
> > {
> >   using _Cat
> > = typename 
> > iterator_traits>::iterator_category;
> > @@ -5047,7 +5049,9 @@ namespace views::__adaptor
> > using __detail::__range_iter_cat;
> > using _Res = invoke_result_t<__maybe_const_t<_Const, _Fp>&,
> >  
> > range_reference_t<__maybe_const_t<_Const, _Vs>>...>;
> > -   if constexpr (!is_lvalue_reference_v<_Res>)
> > +   // _GLIBCXX_RESOLVE_LIB_DEFECTS
> > +   // 3798. Rvalue reference and iterator_category
> > +   if constexpr (!is_reference_v<_Res>)
> >   return input_iterator_tag{};
> > else if constexpr ((derived_from<__range_iter_cat<_Vs, _Const>,
> >  random_access_iterator_tag> && 
> > ...))
> > @@ -5820,7 +5824,9 @@ namespace views::__adaptor
> >using _Res = invoke_result_t<__unarize<__maybe_const_t<_Const, 
> > _Fp>&, _Nm>,
> >  range_reference_t<_Base>>;
> >using _Cat = typename 
> > iterator_traits>::iterator_category;
> > -  if constexpr (!is_lvalue_reference_v<_Res>)
> > +  // _GLIBCXX_RESOLVE_LIB_DEFECTS
> > +  // 3798. Rvalue reference and iterator_category
> > +  if constexpr (!is_reference_v<_Res>)
> >   return input_iterator_tag{};
> >else if constexpr (derived_from<_Cat, random_access_iterator_tag>)
> >   return random_access_iterator_tag{};
> > @@ -7228,7 +7234,9 @@ namespace views::__adaptor
> > using _OuterCat = typename 
> > iterator_traits<_OuterIter>::iterator_category;
> > using _InnerCat = typename 
> > iterator_traits<_InnerIter>::iterator_category;
> > using _PatternCat = typename 
> > iterator_traits<_PatternIter>::iterator_category;
> > -   if constexpr 
> > (!is_lvalue_reference_v,
> > +   // _GLIBCXX_RESOLVE_LIB_DEFECTS
> > +   // 3798. Rvalue reference and iterator_category
> > +   if constexpr 
> > (!is_reference_v,
> > 
> > iter_reference_t<_PatternIter>>>)
>
> This line is misalig

Re: [PATCH v2] alpha: Add -mlra option

2024-10-15 Thread Jeff Law




On 10/15/24 6:47 AM, Richard Biener wrote:

On Tue, Oct 15, 2024 at 2:41 PM John Paul Adrian Glaubitz
 wrote:


Hi Maciej,

On Tue, 2024-10-15 at 13:36 +0100, Maciej W. Rozycki wrote:

IMO, we should simply deprecate non-BWX targets. If reload is going
away, then there is no way for non-BWX targets to access reload
internals they require for compilation. As mentioned in the PR,
non-BWX targets are removed from distros anyway, so I guess there is
no point to invest much time to modernize them,


  Well, I have a lasting desire to keep non-BWX Alphas running, under Linux
in particular, and I'm going to look into any issues around it; reload vs
LRA is all software, so things can always be sorted one way or another.

  While I've been distracted by other matters lately, such as hardware
failures that had to be dealt with urgently, this is now my priority #1
and I do hope to have at least some critical initial stuff in with this
release cycle (noting that only ~5 weeks have left).


That's great.

While I'm not really an expert for compiler development, I have beefy hardware
available for GCC and kernel build tests, so if you have any patches for 
testing,
please let me know.


  NB I spoke to Richard about it while at LPC 2024 recently.


OK, good.

FWIW, it *seems* that LRA seems to just work with EV56 as the baseline and the
following replacements in the code:

 s/reload_in_progress/reload_in_progress || lra_in_progress/g


If you can provide -mlra vs. -mno-lra testsuite results as well that
would be interesting.

Does "just work" mean you can build the compiler and its target
libraries?  In this case
I would suggest to go further and pull the trigger now, defaulting to
LRA but allowing
to switch back to reload for testing.  This is so the few people
testing alpha at all can
increase testing coverage - I don't think anybody runs older than EV5 HW.

Is VMS on alpha still a thing btw?  I still see it mentioned in config.gcc
Also note if we think it's basically working I can flip my tester to 
default to LRA.  It bootstraps and regtests alpha once a week via qemu.


I think it's testing the baseline configuration, so presumably non-BWX 
variants.  That can probably be adjusted if necessary.


Jeff


Re: [PATCH] RISC-V: Use biggest_mode as mode for constants.

2024-10-15 Thread Jeff Law




On 10/15/24 6:55 AM, Robin Dapp wrote:

Hi,

in compute_nregs_for_mode we expect that the current variable's mode is
at most as large as the biggest mode to be used for vectorization.

This might not be true for constants as they don't actually have a mode.
In that case, just use the biggest mode so max_number_of_live_regs
returns 1.

This fixes several test cases in the test suite.

Regtested on rv64gcv and letting the CI work out the rest.

Regards
  Robin

gcc/ChangeLog:

PR target/116655

* config/riscv/riscv-vector-costs.cc (max_number_of_live_regs):
Use biggest mode instead of constant's saved mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr116655.c: New test.

Seems reasonable.

jeff



Re: [PATCH 3/5] arm: [MVE intrinsics] Add load_extending and store_truncating function bases

2024-10-15 Thread Richard Earnshaw (lists)
On 16/09/2024 10:38, Christophe Lyon wrote:
> From: Alfie Richards 
> 
> This patch adds the load_extending and store_truncating function bases
> for MVE intrinsics.
> 
> The constructors have parameters describing the memory element
> type/width which is part of the function base name (e.g. "h" in
> vldrhq).
> 
> 2024-09-11  Alfie Richards 
> 
>   gcc/
> 
>   * config/arm/arm-mve-builtins-functions.h
>   (load_extending): New class.
>   (store_truncating): New class.
>   * config/arm/arm-protos.h (arm_mve_data_mode): New helper function.
>   * config/arm/arm.cc (arm_mve_data_mode): New helper function.

This patch is technically ok, but there are some formatting issues that make 
the code layout slightly confusing and hard to read:

+return arm_mve_data_mode (GET_MODE_INNER (mem_mode),
+ GET_MODE_NUNITS (reg_mode))
+  .require ();

The stray ".require ();" on its own looks strange given the indentation.  Your 
line is short enough that you can write 

+return arm_mve_data_mode (GET_MODE_INNER (mem_mode),
+  GET_MODE_NUNITS (reg_mode)).require ();


+unsigned int element_bits = GET_MODE_BITSIZE (
+  (fi.type_suffix (0).integer_p
+   ? m_to_int_mode
+   : m_to_float_mode.require ()));

Here you should put "= GET_MODE_BITSIZE (" on the following line, then indent 
the arguments to the opening paren of the function arguments:

+unsigned int element_bits
+  = GET_MODE_BITSIZE (fi.type_suffix (0).integer_p
+  ? m_to_int_mode
+  : m_to_float_mode.require ());

And you can then lose the extra level of parenthesis.

+return arm_mve_data_mode (
+  (fi.type_suffix (0).integer_p
+   ? m_to_int_mode
+   : m_to_float_mode.require ()),
+  nunits)
+  .require ();

In this case I'd split the selection operation into a separate statement, 
giving (if I've got the type correct):

+scalar_mode mode = (fi.type_suffix (0).integer_p
+? m_to_int_mode
+: m_to_float_mode.require ());
+return arm_mve_data_mode (mode, nunits).require ();

OK with those changes.

R.

> ---
>  gcc/config/arm/arm-mve-builtins-functions.h | 106 
>  gcc/config/arm/arm-protos.h |   3 +
>  gcc/config/arm/arm.cc   |  15 +++
>  3 files changed, 124 insertions(+)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-functions.h 
> b/gcc/config/arm/arm-mve-builtins-functions.h
> index ac2a731bff4..e47bc69936e 100644
> --- a/gcc/config/arm/arm-mve-builtins-functions.h
> +++ b/gcc/config/arm/arm-mve-builtins-functions.h
> @@ -20,6 +20,8 @@
>  #ifndef GCC_ARM_MVE_BUILTINS_FUNCTIONS_H
>  #define GCC_ARM_MVE_BUILTINS_FUNCTIONS_H
>  
> +#include "arm-protos.h"
> +
>  namespace arm_mve {
>  
>  /* Wrap T, which is derived from function_base, and indicate that the
> @@ -1024,6 +1026,110 @@ public:
>}
>  };
>  
> +/* A function_base that loads elements from memory and extends them
> +   to a wider element.  The memory element type is a fixed part of
> +   the function base name.  */
> +class load_extending : public function_base
> +{
> +public:
> +  CONSTEXPR load_extending (type_suffix_index signed_memory_type,
> + type_suffix_index unsigned_memory_type,
> + type_suffix_index float_memory_type)
> +: m_signed_memory_type (signed_memory_type),
> +  m_unsigned_memory_type (unsigned_memory_type),
> +  m_float_memory_type (float_memory_type)
> +  {}
> +  CONSTEXPR load_extending (type_suffix_index signed_memory_type,
> + type_suffix_index unsigned_memory_type)
> +: m_signed_memory_type (signed_memory_type),
> +  m_unsigned_memory_type (unsigned_memory_type),
> +  m_float_memory_type (NUM_TYPE_SUFFIXES)
> +  {}
> +
> +  unsigned int call_properties (const function_instance &) const override
> +  {
> +return CP_READ_MEMORY;
> +  }
> +
> +  tree memory_scalar_type (const function_instance &fi) const override
> +  {
> +type_suffix_index memory_type_suffix
> +  = (fi.type_suffix (0).integer_p
> +  ? (fi.type_suffix (0).unsigned_p
> + ? m_unsigned_memory_type
> + : m_signed_memory_type)
> +  : m_float_memory_type);
> +return scalar_types[type_suffixes[memory_type_suffix].vector_type];
> +  }
> +
> +  machine_mode memory_vector_mode (const function_instance &fi) const 
> override
> +  {
> +type_suffix_index memory_type_suffix
> +  = (fi.type_suffix (0).integer_p
> +  ? (fi.type_suffix (0).unsigned_p
> + ? m_unsigned_memory_type
> + : m_signed_memory_type)
> +  : m_float_memory_type);
> +machine_mode mem_mode = type_suffixes[memory_type_suffix].vector_mode;
> +machine_mode reg_mode = fi.vector_mode (0);
> +
> +return arm_mve_data_mode (GET_MODE_INNER (mem_mode),
> +   GET_MODE_NUNITS (r

Re: [PATCH 2/2] c++: constrained auto NTTP vs associated constraints

2024-10-15 Thread Patrick Palka
On Tue, 15 Oct 2024, Patrick Palka wrote:

> According to [temp.param]/11, the constraint on an auto NTTP is an
> associated constraint and so should be checked as part of satisfaction
> of the overall associated constraints rather than checked individually
> during coerion/deduction.

By the way, I wonder if such associated constraints should be relevant for
subsumption now?

template concept C = true;

template concept D = C && true;

template void f(); // #1
template void f(); // #2

int main() {
  f<0>(); // still ambiguous?
}

With this patch the above call is still ambiguous despite #2 now being
more constrained than #1 because "more constrained" is only considered for
function templates with the same signatures as per
https://eel.is/c++draft/temp.func.order#6.2.2 and we deem their signatures
to be different due to the different type-constraint.

MSVC also rejects, but Clang accepts and selects #2.

> 
> In order to implement this we mainly need to make handling of
> constrained auto NTTPs go through finish_constrained_parameter so that
> TEMPLATE_PARMS_CONSTRAINTS gets set on them.
> 
> gcc/cp/ChangeLog:
> 
>   * constraint.cc (finish_shorthand_constraint): Add is_non_type
>   parameter for handling constrained (auto) NTTPS.
>   * cp-tree.h (do_auto_deduction): Adjust declaration.
>   (copy_template_args): Declare.
>   (finish_shorthand_constraint): Adjust declaration.
>   * parser.cc (cp_parser_constrained_type_template_parm): Inline
>   into its only caller and remove.
>   (cp_parser_constrained_non_type_template_parm): Likewise.
>   (finish_constrained_parameter): Simplify after the above.
>   (cp_parser_template_parameter): Dispatch to
>   finish_constrained_parameter for a constrained auto NTTP.
>   * pt.cc (process_template_parm): Pass is_non_type to
>   finish_shorthand_constraint.
>   (convert_template_argument): Adjust call to do_auto_deduction.
>   (copy_template_args): Remove static.
>   (unify): Adjust call to do_auto_deduction.
>   (make_constrained_placeholder_type): Return the type not the
>   TYPE_NAME for consistency with make_auto etc.
>   (do_auto_deduction): Remove now unused tmpl parameter.  Don't
>   check constraints on an auto NTTP even in a non-template
>   context.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp2a/concepts-placeholder12.C: Adjust expected error
>   upon constrained auto NTTP satisfaction failure.
>   * g++.dg/cpp2a/concepts-pr97093.C: Likewise.
>   * g++.dg/cpp2a/concepts-template-parm2.C: Likewise.
>   * g++.dg/cpp2a/concepts-template-parm6.C: Likewise.
> ---
>  gcc/cp/constraint.cc  | 32 +--
>  gcc/cp/cp-tree.h  |  6 +--
>  gcc/cp/parser.cc  | 54 +++
>  gcc/cp/pt.cc  | 35 +---
>  .../g++.dg/cpp2a/concepts-placeholder12.C |  4 +-
>  gcc/testsuite/g++.dg/cpp2a/concepts-pr97093.C |  2 +-
>  .../g++.dg/cpp2a/concepts-template-parm2.C|  2 +-
>  .../g++.dg/cpp2a/concepts-template-parm6.C|  2 +-
>  8 files changed, 66 insertions(+), 71 deletions(-)
> 
> diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> index 35be9cc2b41..9394bea8835 100644
> --- a/gcc/cp/constraint.cc
> +++ b/gcc/cp/constraint.cc
> @@ -1189,7 +1189,7 @@ build_constrained_parameter (tree cnc, tree proto, tree 
> args)
> done only after the requires clause has been parsed (or not).  */
>  
>  tree
> -finish_shorthand_constraint (tree decl, tree constr)
> +finish_shorthand_constraint (tree decl, tree constr, bool is_non_type)
>  {
>/* No requirements means no constraints.  */
>if (!constr)
> @@ -1198,9 +1198,22 @@ finish_shorthand_constraint (tree decl, tree constr)
>if (error_operand_p (constr))
>  return NULL_TREE;
>  
> -  tree proto = CONSTRAINED_PARM_PROTOTYPE (constr);
> -  tree con = CONSTRAINED_PARM_CONCEPT (constr);
> -  tree args = CONSTRAINED_PARM_EXTRA_ARGS (constr);
> +  tree proto, con, args;
> +  if (is_non_type)
> +{
> +  tree id = PLACEHOLDER_TYPE_CONSTRAINTS (constr);
> +  tree tmpl = TREE_OPERAND (id, 0);
> +  tree parms = DECL_INNERMOST_TEMPLATE_PARMS (tmpl);
> +  proto = TREE_VALUE (TREE_VEC_ELT (parms, 0));
> +  con = DECL_TEMPLATE_RESULT (tmpl);
> +  args = TREE_OPERAND (id, 1);
> +}
> +  else
> +{
> +  proto = CONSTRAINED_PARM_PROTOTYPE (constr);
> +  con = CONSTRAINED_PARM_CONCEPT (constr);
> +  args = CONSTRAINED_PARM_EXTRA_ARGS (constr);
> +}
>  
>bool variadic_concept_p = template_parameter_pack_p (proto);
>bool declared_pack_p = template_parameter_pack_p (decl);
> @@ -1214,7 +1227,16 @@ finish_shorthand_constraint (tree decl, tree constr)
>  
>/* Build the concept constraint-expression.  */
>tree tmpl = DECL_TI_TEMPLATE (con);
> -  tree check = build_concept_check (tmpl, arg, args, tf_warni

Re: [PATCH 4/5] arm: [MVE intrinsics] Add support for predicated contiguous loads and stores

2024-10-15 Thread Richard Earnshaw (lists)
On 16/09/2024 10:38, Christophe Lyon wrote:
> From: Alfie Richards 
> 
> This patch extends
> function_expander::use_contiguous_load_insn and
> function_expander::use_contiguous_store_insn functions to
> support predicated versions.
> 
> 2024-09-11  Alfie Richards  
>   Christophe Lyon  
> 
>   gcc/
> 
>   * config/arm/arm-mve-builtins.cc
>   (function_expander::use_contiguous_load_insn): Add support for
>   PRED_z.
>   (function_expander::use_contiguous_store_insn): Add support for
>   PRED_p.

OK.

R.

> ---
>  gcc/config/arm/arm-mve-builtins.cc | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins.cc 
> b/gcc/config/arm/arm-mve-builtins.cc
> index 7e8217666fe..f519fded000 100644
> --- a/gcc/config/arm/arm-mve-builtins.cc
> +++ b/gcc/config/arm/arm-mve-builtins.cc
> @@ -2237,6 +2237,8 @@ function_expander::use_contiguous_load_insn (insn_code 
> icode)
>  
>add_output_operand (icode);
>add_mem_operand (mem_mode, get_contiguous_base ());
> +  if (pred == PRED_z)
> +add_input_operand (icode, args[1]);
>return generate_insn (icode);
>  }
>  
> @@ -2249,6 +2251,8 @@ function_expander::use_contiguous_store_insn (insn_code 
> icode)
>  
>add_mem_operand (mem_mode, get_contiguous_base ());
>add_input_operand (icode, args[1]);
> +  if (pred == PRED_p)
> +add_input_operand (icode, args[2]);
>return generate_insn (icode);
>  }
>  



Re: [PATCH v2] alpha: Add -mlra option

2024-10-15 Thread John Paul Adrian Glaubitz
On Tue, 2024-10-15 at 07:56 -0600, Jeff Law wrote:
> Also note if we think it's basically working I can flip my tester to 
> default to LRA.  It bootstraps and regtests alpha once a week via qemu.
> 
> I think it's testing the baseline configuration, so presumably non-BWX 
> variants.  That can probably be adjusted if necessary.

It does seem to fail when enabling M2 though:

m2/pge -k -l ../../gcc/m2/gm2-compiler/P2Build.bnf -o 
m2/gm2-compiler-boot/P2Build.mod
m2/pge -k -l ../../gcc/m2/gm2-compiler/P3Build.bnf -o 
m2/gm2-compiler-boot/P3Build.mod
m2/pge -k -l ../../gcc/m2/gm2-compiler/PHBuild.bnf -o 
m2/gm2-compiler-boot/PHBuild.mod
m2/pge -k -l ../../gcc/m2/gm2-compiler/PCBuild.bnf -o 
m2/gm2-compiler-boot/PCBuild.mod
m2/pge -k -l ../../gcc/m2/gm2-compiler/P1Build.bnf -o 
m2/gm2-compiler-boot/P1Build.mod
m2/pge -k -l ../../gcc/m2/gm2-compiler/P0SyntaxCheck.bnf -o 
m2/gm2-compiler-boot/P0SyntaxCheck.mod
terminate called after throwing an instance of 'unsigned int'
make[3]: *** [../../gcc/m2/Make-lang.in:1778: m2/gm2-compiler-boot/P2Build.mod] 
Aborted
make[3]: *** Deleting file 'm2/gm2-compiler-boot/P2Build.mod'
make[3]: *** Waiting for unfinished jobs
terminate called after throwing an instance of 'unsigned int'
make[3]: *** [../../gcc/m2/Make-lang.in:1778: 
m2/gm2-compiler-boot/P0SyntaxCheck.mod] Aborted
make[3]: *** Deleting file 'm2/gm2-compiler-boot/P0SyntaxCheck.mod'
terminate called after throwing an instance of 'unsigned int'
make[3]: *** [../../gcc/m2/Make-lang.in:1778: m2/gm2-compiler-boot/P1Build.mod] 
Aborted
make[3]: *** Deleting file 'm2/gm2-compiler-boot/P1Build.mod'
terminate called after throwing an instance of 'unsigned int'
make[3]: *** [../../gcc/m2/Make-lang.in:1778: m2/gm2-compiler-boot/P3Build.mod] 
Aborted
make[3]: *** Deleting file 'm2/gm2-compiler-boot/P3Build.mod'
terminate called after throwing an instance of 'unsigned int'
make[3]: *** [../../gcc/m2/Make-lang.in:1778: m2/gm2-compiler-boot/PHBuild.mod] 
Aborted
make[3]: *** Deleting file 'm2/gm2-compiler-boot/PHBuild.mod'
terminate called after throwing an instance of 'unsigned int'
make[3]: *** [../../gcc/m2/Make-lang.in:1778: m2/gm2-compiler-boot/PCBuild.mod] 
Aborted
make[3]: *** Deleting file 'm2/gm2-compiler-boot/PCBuild.mod'

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


[PATCH 2/7] libstdc++: Make __normal_iterator constexpr, always_inline, nodiscard

2024-10-15 Thread Jonathan Wakely
Tested x86_64-linux.

-- >8 --

The __gnu_cxx::__normal_iterator type we use for std::vector::iterator
is not specified by the standard, it's an implementation detail. This
means it's not constrained by the rule that forbids strengthening
constexpr. We can make it meet the constexpr iterator requirements for
older standards, not only when it's required to be for C++20.

For the non-const member functions they can't be constexpr in C++11, so
use _GLIBCXX14_CONSTEXPR for those. For all constructors, const members
and non-member operator overloads, use _GLIBCXX_CONSTEXPR or just
constexpr.

We can also liberally add [[nodiscard]] and [[gnu::always_inline]]
attributes to those functions.

Also change some internal helpers for std::move_iterator which can be
unconditionally constexpr and marked nodiscard.

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h (__normal_iterator): Make all
members and overloaded operators constexpr before C++20.
(__niter_base, __niter_wrap, __to_address): Add nodiscard
and always_inline attributes.
(__make_move_if_noexcept_iterator, __miter_base): Add nodiscard
and make unconditionally constexpr.
---
 libstdc++-v3/include/bits/stl_iterator.h | 125 ++-
 1 file changed, 76 insertions(+), 49 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index 85b9861..3cc10a160bd 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -656,7 +656,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 _GLIBCXX20_CONSTEXPR
-auto
+inline auto
 __niter_base(reverse_iterator<_Iterator> __it)
 -> decltype(__make_reverse_iterator(__niter_base(__it.base(
 { return __make_reverse_iterator(__niter_base(__it.base())); }
@@ -668,7 +668,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 _GLIBCXX20_CONSTEXPR
-auto
+inline auto
 __miter_base(reverse_iterator<_Iterator> __it)
 -> decltype(__make_reverse_iterator(__miter_base(__it.base(
 { return __make_reverse_iterator(__miter_base(__it.base())); }
@@ -1060,23 +1060,28 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   using iterator_concept = std::__detail::__iter_concept<_Iterator>;
 #endif
 
-  _GLIBCXX_CONSTEXPR __normal_iterator() _GLIBCXX_NOEXCEPT
-  : _M_current(_Iterator()) { }
+  __attribute__((__always_inline__))
+  _GLIBCXX_CONSTEXPR
+  __normal_iterator() _GLIBCXX_NOEXCEPT
+  : _M_current() { }
 
-  explicit _GLIBCXX20_CONSTEXPR
+  __attribute__((__always_inline__))
+  explicit _GLIBCXX_CONSTEXPR
   __normal_iterator(const _Iterator& __i) _GLIBCXX_NOEXCEPT
   : _M_current(__i) { }
 
   // Allow iterator to const_iterator conversion
 #if __cplusplus >= 201103L
   template>
-   _GLIBCXX20_CONSTEXPR
+   [[__gnu__::__always_inline__]]
+   constexpr
__normal_iterator(const __normal_iterator<_Iter, _Container>& __i)
noexcept
 #else
   // N.B. _Container::pointer is not actually in container requirements,
   // but is present in std::vector and std::basic_string.
   template
+   __attribute__((__always_inline__))
 __normal_iterator(const __normal_iterator<_Iter,
  typename __enable_if<
   (std::__are_same<_Iter, typename _Container::pointer>::__value),
@@ -1085,17 +1090,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : _M_current(__i.base()) { }
 
   // Forward iterator requirements
-  _GLIBCXX20_CONSTEXPR
+
+  __attribute__((__always_inline__)) _GLIBCXX_NODISCARD
+  _GLIBCXX_CONSTEXPR
   reference
   operator*() const _GLIBCXX_NOEXCEPT
   { return *_M_current; }
 
-  _GLIBCXX20_CONSTEXPR
+  __attribute__((__always_inline__)) _GLIBCXX_NODISCARD
+  _GLIBCXX_CONSTEXPR
   pointer
   operator->() const _GLIBCXX_NOEXCEPT
   { return _M_current; }
 
-  _GLIBCXX20_CONSTEXPR
+  __attribute__((__always_inline__))
+  _GLIBCXX14_CONSTEXPR
   __normal_iterator&
   operator++() _GLIBCXX_NOEXCEPT
   {
@@ -1103,13 +1112,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return *this;
   }
 
-  _GLIBCXX20_CONSTEXPR
+  __attribute__((__always_inline__))
+  _GLIBCXX14_CONSTEXPR
   __normal_iterator
   operator++(int) _GLIBCXX_NOEXCEPT
   { return __normal_iterator(_M_current++); }
 
   // Bidirectional iterator requirements
-  _GLIBCXX20_CONSTEXPR
+
+  __attribute__((__always_inline__))
+  _GLIBCXX14_CONSTEXPR
   __normal_iterator&
   operator--() _GLIBCXX_NOEXCEPT
   {
@@ -1117,38 +1129,46 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return *this;
   }
 
-  _GLIBCXX20_CONSTEXPR
+  __attribute__((__always_inline__))
+  _GLIBCXX14_CONSTEXPR
   __normal_iterator
   operator--(int) _GLIBCXX_NOEXCEPT
   { return __normal_iterator(_M_current--); }
 
   // Ra

[PATCH 4/7] libstdc++: Remove indirection to __find_if in std::find etc.

2024-10-15 Thread Jonathan Wakely
Tested x86_64-linux.

-- >8 --

There doesn't seem to be a lot of benefit in reusing __find_if with
__gnu_cxx::__ops predicates, since they aren't going to actually
instantiate any less code if we use different predicates every time
(e.g. __ops::__negate, or __ops::__iter_equals_val, or
__ops::__pred_iter).

And now that std::find no longer calls __find_if (because it just does a
loop directly), we can make the _Iter_equals_val case of __find_if call
std::find, to take advantage of its memchr optimization. This benefits
other algos like search_n which use __find_if with _Iter_equals_val.

libstdc++-v3/ChangeLog:

* include/bits/stl_algo.h (__find_if_not): Do loop here instead
of using __find_if with __gnu_cxx::__ops predicate.
(find_if): Likewise.
(find): Move to ...
* include/bits/stl_algobase.h (find): ... here.
(__find_if): Overload for _Iter_equals_val predicate.
---
 libstdc++-v3/include/bits/stl_algo.h | 63 +++-
 libstdc++-v3/include/bits/stl_algobase.h | 61 +++
 2 files changed, 68 insertions(+), 56 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_algo.h 
b/libstdc++-v3/include/bits/stl_algo.h
index 489ce7e14d2..05c1dbd07b6 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -105,15 +105,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
std::iter_swap(__result, __b);
 }
 
-  /// Provided for stable_partition to use.
+  // Used by std::find_if_not and __stable_partition.
   template
 _GLIBCXX20_CONSTEXPR
 inline _InputIterator
 __find_if_not(_InputIterator __first, _InputIterator __last,
  _Predicate __pred)
 {
-  return std::__find_if(__first, __last,
-   __gnu_cxx::__ops::__negate(__pred));
+  while (__first != __last && __pred(__first))
+   ++__first;
+  return __first;
 }
 
   /// Like find_if_not(), but uses and updates a count of the
@@ -3810,57 +3811,6 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
 }
 #endif // C++17
 
-  /**
-   *  @brief Find the first occurrence of a value in a sequence.
-   *  @ingroup non_mutating_algorithms
-   *  @param  __first  An input iterator.
-   *  @param  __last   An input iterator.
-   *  @param  __valThe value to find.
-   *  @return   The first iterator @c i in the range @p [__first,__last)
-   *  such that @c *i == @p __val, or @p __last if no such iterator exists.
-  */
-  template
-_GLIBCXX20_CONSTEXPR
-inline _InputIterator
-find(_InputIterator __first, _InputIterator __last, const _Tp& __val)
-{
-  // concept requirements
-  __glibcxx_function_requires(_InputIteratorConcept<_InputIterator>)
-  __glibcxx_function_requires(_EqualOpConcept<
-   typename iterator_traits<_InputIterator>::value_type, _Tp>)
-  __glibcxx_requires_valid_range(__first, __last);
-
-#if __cpp_if_constexpr && __glibcxx_type_trait_variable_templates
-  using _ValT = typename iterator_traits<_InputIterator>::value_type;
-  if constexpr (__can_use_memchr_for_find<_ValT, _Tp>)
-   if constexpr (is_pointer_v
-#if __cpp_lib_concepts
-   || contiguous_iterator<_InputIterator>
-#endif
-)
- {
-   // If conversion to the 1-byte value_type alters the value,
-   // it would not be found by std::find using equality comparison.
-   // We need to check this here, because otherwise something like
-   // memchr("a", 'a'+256, 1) would give a false positive match.
-   if (!(static_cast<_ValT>(__val) == __val))
- return __last;
-   else if (!__is_constant_evaluated())
- {
-   const void* __p0 = std::__to_address(__first);
-   const int __ival = static_cast(__val);
-   if (auto __n = std::distance(__first, __last); __n > 0)
- if (auto __p1 = __builtin_memchr(__p0, __ival, __n))
-   return __first + ((const char*)__p1 - (const char*)__p0);
-   return __last;
- }
- }
-#endif
-
-  return std::__find_if(__first, __last,
-   __gnu_cxx::__ops::__iter_equals_val(__val));
-}
-
   /**
*  @brief Find the first element in a sequence for which a
* predicate is true.
@@ -3883,8 +3833,9 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
  typename iterator_traits<_InputIterator>::value_type>)
   __glibcxx_requires_valid_range(__first, __last);
 
-  return std::__find_if(__first, __last,
-   __gnu_cxx::__ops::__pred_iter(__pred));
+  while (__first != __last && !__pred(*__first))
+   ++__first;
+  return __first;
 }
 
   /**
diff --git a/libstdc++-v3/include/bits/stl_algobase.h 
b/libstdc++-v3/include/bits/stl_algobase.h
index 5f77b00be9b..34e1cf7322f 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase

[PATCH 7/7] libstdc++: Reuse std::__assign_one in

2024-10-15 Thread Jonathan Wakely
Tested x86_64-linux.

-- >8 --

Use std::__assign_one instead of ranges::__assign_one. Adjust the uses,
because std::__assign_one has the arguments in the opposite order (the
same order as an assignment expression).

libstdc++-v3/ChangeLog:

* include/bits/ranges_algobase.h (ranges::__assign_one): Remove.
(__copy_or_move, __copy_or_move_backward): Use std::__assign_one
instead of ranges::__assign_one.
---
 libstdc++-v3/include/bits/ranges_algobase.h | 22 ++---
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/libstdc++-v3/include/bits/ranges_algobase.h 
b/libstdc++-v3/include/bits/ranges_algobase.h
index 0345ea850a4..df4e770e7a6 100644
--- a/libstdc++-v3/include/bits/ranges_algobase.h
+++ b/libstdc++-v3/include/bits/ranges_algobase.h
@@ -225,16 +225,6 @@ namespace ranges
  copy_backward_result<_Iter, _Out>>
 __copy_or_move_backward(_Iter __first, _Sent __last, _Out __result);
 
-  template
-constexpr void
-__assign_one(_Iter& __iter, _Out& __result)
-{
-  if constexpr (_IsMove)
- *__result = std::move(*__iter);
-  else
- *__result = *__iter;
-}
-
   template _Sent,
   weakly_incrementable _Out>
@@ -294,14 +284,14 @@ namespace ranges
__builtin_memmove(__result, __first,
  sizeof(_ValueTypeI) * __num);
  else if (__num == 1)
-   ranges::__assign_one<_IsMove>(__first, __result);
+   std::__assign_one<_IsMove>(__result, __first);
  return {__first + __num, __result + __num};
}
}
 
  for (auto __n = __last - __first; __n > 0; --__n)
{
- ranges::__assign_one<_IsMove>(__first, __result);
+ std::__assign_one<_IsMove>(__result, __first);
  ++__first;
  ++__result;
}
@@ -311,7 +301,7 @@ namespace ranges
{
  while (__first != __last)
{
- ranges::__assign_one<_IsMove>(__first, __result);
+ std::__assign_one<_IsMove>(__result, __first);
  ++__first;
  ++__result;
}
@@ -423,7 +413,7 @@ namespace ranges
__builtin_memmove(__result, __first,
  sizeof(_ValueTypeI) * __num);
  else if (__num == 1)
-   ranges::__assign_one<_IsMove>(__first, __result);
+   std::__assign_one<_IsMove>(__result, __first);
  return {__first + __num, __result};
}
}
@@ -435,7 +425,7 @@ namespace ranges
{
  --__tail;
  --__result;
- ranges::__assign_one<_IsMove>(__tail, __result);
+ std::__assign_one<_IsMove>(__result, __tail);
}
  return {std::move(__lasti), std::move(__result)};
}
@@ -448,7 +438,7 @@ namespace ranges
{
  --__tail;
  --__result;
- ranges::__assign_one<_IsMove>(__tail, __result);
+ std::__assign_one<_IsMove>(__result, __tail);
}
  return {std::move(__lasti), std::move(__result)};
}
-- 
2.46.2



[PATCH 5/7] libstdc++: Add nodiscard to std::find

2024-10-15 Thread Jonathan Wakely
Tested x86_64-linux.

-- >8 --

libstdc++-v3/ChangeLog:

* include/bits/stl_algobase.h (find): Add nodiscard.
---
 libstdc++-v3/include/bits/stl_algobase.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/stl_algobase.h 
b/libstdc++-v3/include/bits/stl_algobase.h
index 34e1cf7322f..d9d1d00b113 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -2087,7 +2087,7 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
*  such that @c *i == @p __val, or @p __last if no such iterator exists.
   */
   template
-_GLIBCXX20_CONSTEXPR
+_GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
 inline _InputIterator
 find(_InputIterator __first, _InputIterator __last, const _Tp& __val)
 {
-- 
2.46.2



[PATCH 1/7] libstdc++: Refactor std::uninitialized_{copy, fill, fill_n} algos [PR68350]

2024-10-15 Thread Jonathan Wakely
This is v2 of
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665246.html
fixing some thinkos in uninitialized_{fill,fill_n}. We don't need to
worry about overwriting tail-padding in those algos, because we only use
memset for 1-byte integer types. So they have no tail padding that can
be reused anyway! So this changes __n > 1 to __n > 0 in a few places
(which fixes the problem that it was not actually filling anything for
the n==1 cases).

Also simplify std::__to_address(__result++) to just __result++ because
we already have a pointer, and use std::to_address(result++) for a C++20
std::contiguous_iterator case, instead of addressof(*result++).

Tested x86_64-linux.

-- >8 --

This refactors the std::uninitialized_copy, std::uninitialized_fill and
std::uninitialized_fill_n algorithms to directly perform memcpy/memset
optimizations instead of dispatching to std::copy/std::fill/std::fill_n.

The reasons for this are:

- Use 'if constexpr' to simplify and optimize compilation throughput, so
  dispatching to specialized class templates is only needed for C++98
  mode.
- Relax the conditions for using memcpy/memset, because the C++20 rules
  on implicit-lifetime types mean that we can rely on memcpy to begin
  lifetimes of trivially copyable types.  We don't need to require
  trivially default constructible, so don't need to limit the
  optimization to trivial types. See PR 68350 for more details.
- The conditions on non-overlapping ranges are stronger for
  std::uninitialized_copy than for std::copy so we can use memcpy instead
  of memmove, which might be a minor optimization.
- Avoid including  in .
  It only needs some iterator utilities from that file now, which belong
  in  anyway, so this moves them there.

Several tests need changes to the diagnostics matched by dg-error
because we no longer use the __constructible() function that had a
static assert in. Now we just get straightforward errors for attempting
to use a deleted constructor.

Two tests needed more signficant changes to the actual expected results
of executing the tests, because they were checking for old behaviour
which was incorrect according to the standard.
20_util/specialized_algorithms/uninitialized_copy/64476.cc was expecting
std::copy to be used for a call to std::uninitialized_copy involving two
trivially copyable types. That was incorrect behaviour, because a
non-trivial constructor should have been used, but using std::copy used
trivial default initialization followed by assignment.
20_util/specialized_algorithms/uninitialized_fill_n/sizes.cc was testing
the behaviour with a non-integral Size passed to uninitialized_fill_n,
but I wrote the test looking at the requirements of uninitialized_copy_n
which are not the same as uninitialized_fill_n. The former uses --n and
tests n > 0, but the latter just tests n-- (which will never be false
for a floating-point value with a fractional part).

libstdc++-v3/ChangeLog:

PR libstdc++/68350
PR libstdc++/93059
* include/bits/stl_algobase.h (__niter_base, __niter_wrap): Move
to ...
* include/bits/stl_iterator.h: ... here.
* include/bits/stl_uninitialized.h (__check_constructible)
(_GLIBCXX_USE_ASSIGN_FOR_INIT): Remove.
[C++98] (__unwrappable_niter): New trait.
(__uninitialized_copy): Replace use of std::copy.
(uninitialized_copy): Fix Doxygen comments. Open-code memcpy
optimization for C++11 and later.
(__uninitialized_fill): Replace use of std::fill.
(uninitialized_fill): Fix Doxygen comments. Open-code memset
optimization for C++11 and later.
(__uninitialized_fill_n): Replace use of std::fill_n.
(uninitialized_fill_n): Fix Doxygen comments. Open-code memset
optimization for C++11 and later.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/64476.cc:
Adjust expected behaviour to match what the standard specifies.
* 
testsuite/20_util/specialized_algorithms/uninitialized_fill_n/sizes.cc:
Likewise.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/1.cc:
Adjust dg-error directives.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/89164.cc:
Likewise.
* 
testsuite/20_util/specialized_algorithms/uninitialized_copy_n/89164.cc:
Likewise.
* testsuite/20_util/specialized_algorithms/uninitialized_fill/89164.cc:
Likewise.
* 
testsuite/20_util/specialized_algorithms/uninitialized_fill_n/89164.cc:
Likewise.
* testsuite/23_containers/vector/cons/89164.cc: Likewise.
* testsuite/23_containers/vector/cons/89164_c++17.cc: Likewise.
---
 libstdc++-v3/include/bits/stl_algobase.h  |  45 --
 libstdc++-v3/include/bits/stl_iterator.h  |  54 +++
 libstdc++-v3/include/bits/stl_uninitialized.h | 385 +-
 .../uninitialized_copy/1.cc   |   3 +-
 .../uninitialized_copy/64476.cc  

Re: [PATCH 5/5] arm: [MVE intrinsics] Rework MVE vld/vst intrinsics

2024-10-15 Thread Richard Earnshaw (lists)
On 16/09/2024 10:38, Christophe Lyon wrote:
> From: Alfie Richards 
> 
> Implement the mve vld and vst intrinsics using the MVE builtins framework.
> 
> The main part of the patch is to reimplement to vstr/vldr patterns
> such that we now have much fewer of them:
> - non-truncating stores
> - predicated non-truncating stores
> - truncating stores
> - predicated truncating stores
> - non-extending loads
> - predicated non-extending loads
> - extending loads
> - predicated extending loads
> 
> This enables us to update the implementation of vld1/vst1 and use the
> new vldr/vstr builtins.
> 
> The patch also adds support for the predicated vld1/vst1 versions.
> 
> 2024-09-11  Alfie Richards  
>   Christophe Lyon  
> 
>   gcc/
> 
>   * config/arm/arm-mve-builtins-base.cc (vld1q_impl): Add support
>   for predicated version.
>   (vst1q_impl): Likewise.
>   (vstrq_impl): New class.
>   (vldrq_impl): New class.
>   (vldrbq): New.
>   (vldrhq): New.
>   (vldrwq): New.
>   (vstrbq): New.
>   (vstrhq): New.
>   (vstrwq): New.
>   * config/arm/arm-mve-builtins-base.def (vld1q): Add predicated
>   version.
>   (vldrbq): New.
>   (vldrhq): New.
>   (vldrwq): New.
>   (vst1q): Add predicated version.
>   (vstrbq): New.
>   (vstrhq): New.
>   (vstrwq): New.
>   (vrev32q): Update types to float_16.
>   * config/arm/arm-mve-builtins-base.h (vldrbq): New.
>   (vldrhq): New.
>   (vldrwq): New.
>   (vstrbq): New.
>   (vstrhq): New.
>   (vstrwq): New.
>   * config/arm/arm-mve-builtins-functions.h (memory_vector_mode):
>   Remove conversion of floating point vectors to integer.
>   * config/arm/arm-mve-builtins.cc (TYPES_float16): Change to...
>   (TYPES_float_16): ...this.
>   (TYPES_float_32): New.
>   (float16): Change to...
>   (float_16): ...this.
>   (float_32): New.
>   (preds_z_or_none): New.
>   (function_resolver::check_gp_argument): Add support for _z
>   predicate.
>   * config/arm/arm_mve.h (vstrbq): Remove.
>   (vstrbq_p): Likewise.
>   (vstrhq): Likewise.
>   (vstrhq_p): Likewise.
>   (vstrwq): Likewise.
>   (vstrwq_p): Likewise.
>   (vst1q_p): Likewise.
>   (vld1q_z): Likewise.
>   (vldrbq_s8): Likewise.
>   (vldrbq_u8): Likewise.
>   (vldrbq_s16): Likewise.
>   (vldrbq_u16): Likewise.
>   (vldrbq_s32): Likewise.
>   (vldrbq_u32): Likewise.
>   (vstrbq_p_s8): Likewise.
>   (vstrbq_p_s32): Likewise.
>   (vstrbq_p_s16): Likewise.
>   (vstrbq_p_u8): Likewise.
>   (vstrbq_p_u32): Likewise.
>   (vstrbq_p_u16): Likewise.
>   (vldrbq_z_s16): Likewise.
>   (vldrbq_z_u8): Likewise.
>   (vldrbq_z_s8): Likewise.
>   (vldrbq_z_s32): Likewise.
>   (vldrbq_z_u16): Likewise.
>   (vldrbq_z_u32): Likewise.
>   (vldrhq_s32): Likewise.
>   (vldrhq_s16): Likewise.
>   (vldrhq_u32): Likewise.
>   (vldrhq_u16): Likewise.
>   (vldrhq_z_s32): Likewise.
>   (vldrhq_z_s16): Likewise.
>   (vldrhq_z_u32): Likewise.
>   (vldrhq_z_u16): Likewise.
>   (vldrwq_s32): Likewise.
>   (vldrwq_u32): Likewise.
>   (vldrwq_z_s32): Likewise.
>   (vldrwq_z_u32): Likewise.
>   (vldrhq_f16): Likewise.
>   (vldrhq_z_f16): Likewise.
>   (vldrwq_f32): Likewise.
>   (vldrwq_z_f32): Likewise.
>   (vstrhq_f16): Likewise.
>   (vstrhq_s32): Likewise.
>   (vstrhq_s16): Likewise.
>   (vstrhq_u32): Likewise.
>   (vstrhq_u16): Likewise.
>   (vstrhq_p_f16): Likewise.
>   (vstrhq_p_s32): Likewise.
>   (vstrhq_p_s16): Likewise.
>   (vstrhq_p_u32): Likewise.
>   (vstrhq_p_u16): Likewise.
>   (vstrwq_f32): Likewise.
>   (vstrwq_s32): Likewise.
>   (vstrwq_u32): Likewise.
>   (vstrwq_p_f32): Likewise.
>   (vstrwq_p_s32): Likewise.
>   (vstrwq_p_u32): Likewise.
>   (vst1q_p_u8): Likewise.
>   (vst1q_p_s8): Likewise.
>   (vld1q_z_u8): Likewise.
>   (vld1q_z_s8): Likewise.
>   (vst1q_p_u16): Likewise.
>   (vst1q_p_s16): Likewise.
>   (vld1q_z_u16): Likewise.
>   (vld1q_z_s16): Likewise.
>   (vst1q_p_u32): Likewise.
>   (vst1q_p_s32): Likewise.
>   (vld1q_z_u32): Likewise.
>   (vld1q_z_s32): Likewise.
>   (vld1q_z_f16): Likewise.
>   (vst1q_p_f16): Likewise.
>   (vld1q_z_f32): Likewise.
>   (vst1q_p_f32): Likewise.
>   (__arm_vstrbq_s8): Likewise.
>   (__arm_vstrbq_s32): Likewise.
>   (__arm_vstrbq_s16): Likewise.
>   (__arm_vstrbq_u8): Likewise.
>   (__arm_vstrbq_u32): Likewise.
>   (__arm_vstrbq_u16): Likewise.
>   (__arm_vldrbq_s8): Likewise.
>   (__arm_vldrbq_u8): Likewise.
>   (__arm_vldrbq_s16): Likewise.
>   (__arm_vldrbq_u16): Likewise.
>   (__arm_vldrbq_s32): Likewise.
>   (__arm_vldrbq_u32): Likewise.
>   (__arm_vstrbq_p_s8): Likewise.
>   (__arm_vstrbq_p_s32):

[PATCH 3/7] libstdc++: Inline memmove optimizations for std::copy etc. [PR115444]

2024-10-15 Thread Jonathan Wakely
This is a slightly different approach to C++98 compatibility than used
in patch 1/1 of this series for the uninitialized algos. It worked out a
bit cleaner this way for these algos, I think.

Tested x86_64-linux.

-- >8 --

This removes all the __copy_move class template specializations that
decide how to optimize std::copy and std::copy_n. We can inline those
optimizations into the algorithms, using if-constexpr (and macros for
C++98 compatibility) and remove the code dispatching to the various
class template specializations.

Doing this means we implement the optimization directly for std::copy_n
instead of deferring to std::copy, That avoids the unwanted consequence
of advancing the iterator in copy_n only to take the difference later to
get back to the length that we already had in copy_n originally (as
described in PR 115444).

With the new flattened implementations, we can also lower contiguous
iterators to pointers in std::copy/std::copy_n/std::copy_backwards, so
that they benefit from the same memmove optimizations as pointers.
There's a subtlety though: contiguous iterators can potentially throw
exceptions to exit the algorithm early.  So we can only transform the
loop to memmove if dereferencing the iterator is noexcept. We don't
check that incrementing the iterator is noexcept because we advance the
contiguous iterators before using memmove, so that if incrementing would
throw, that happens first. I am writing a proposal (P3249R0) which would
make this unnecessary, so I hope we can drop the nothrow requirements
later.

This change also solves PR 114817 by checking is_trivially_assignable
before optimizing copy/copy_n etc. to memmove. It's not enough to check
that the types are trivially copyable (a precondition for using memmove
at all), we also need to check that the specific assignment that would
be performed by the algorithm is also trivial. Replacing a non-trivial
assignment with memmove would be observable, so not allowed.

libstdc++-v3/ChangeLog:

PR libstdc++/115444
PR libstdc++/114817
* include/bits/stl_algo.h (__copy_n): Remove generic overload
and overload for random access iterators.
(copy_n): Inline generic version of __copy_n here. Do not defer
to std::copy for random access iterators.
* include/bits/stl_algobase.h (__copy_move): Remove.
(__nothrow_contiguous_iterator, __memcpyable_iterators): New
concepts.
(__assign_one, _GLIBCXX_TO_ADDR, _GLIBCXX_ADVANCE): New helpers.
(__copy_move_a2): Inline __copy_move logic and conditional
memmove optimization into the most generic overload.
(__copy_n_a): Likewise.
(__copy_move_backward): Remove.
(__copy_move_backward_a2): Inline __copy_move_backward logic and
memmove optimization into the most generic overload.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/114817.cc:
New test.
* 
testsuite/20_util/specialized_algorithms/uninitialized_copy_n/114817.cc:
New test.
* testsuite/25_algorithms/copy/114817.cc: New test.
* testsuite/25_algorithms/copy/115444.cc: New test.
* testsuite/25_algorithms/copy_n/114817.cc: New test.
---
 libstdc++-v3/include/bits/stl_algo.h  |  24 +-
 libstdc++-v3/include/bits/stl_algobase.h  | 426 +-
 .../uninitialized_copy/114817.cc  |  39 ++
 .../uninitialized_copy_n/114817.cc|  39 ++
 .../testsuite/25_algorithms/copy/114817.cc|  38 ++
 .../testsuite/25_algorithms/copy/115444.cc|  93 
 .../testsuite/25_algorithms/copy_n/114817.cc  |  38 ++
 7 files changed, 469 insertions(+), 228 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/114817.cc
 create mode 100644 
libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy_n/114817.cc
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/copy/114817.cc
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/copy/115444.cc
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/copy_n/114817.cc

diff --git a/libstdc++-v3/include/bits/stl_algo.h 
b/libstdc++-v3/include/bits/stl_algo.h
index a1ef665506d..489ce7e14d2 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -665,25 +665,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return __result;
 }
 
-  template
-_GLIBCXX20_CONSTEXPR
-_OutputIterator
-__copy_n(_InputIterator __first, _Size __n,
-_OutputIterator __result, input_iterator_tag)
-{
-  return std::__niter_wrap(__result,
-  __copy_n_a(__first, __n,
- std::__niter_base(__result), true));
-}
-
-  template
-_GLIBCXX20_CONSTEXPR
-inline _OutputIterator
-__copy_n(_RandomAccessIterator __first, _Size __n,
-_OutputIterator __result, random_access_iterator_tag)
-{ return std

[PATCH 6/7] libstdc++: Add always_inline to some one-liners in

2024-10-15 Thread Jonathan Wakely
Tested x86_64-linux.

-- >8 --

We implement std::copy, std::fill etc. as a series of calls to other
overloads which incrementally peel off layers of iterator wrappers. This
adds a high abstraction penalty for -O0 and potentially even -O1. Add
the always_inline attribute to several functions that are just a single
return statement (and maybe a static_assert, or some concept-checking
assertions which are disabled by default).

libstdc++-v3/ChangeLog:

* include/bits/stl_algobase.h (__copy_move_a1, __copy_move_a)
(__copy_move_backward_a1, __copy_move_backward_a, move_backward)
(__fill_a1, __fill_a, fill, __fill_n_a, fill_n, __equal_aux):
Add always_inline attribute to one-line forwarding functions.
---
 libstdc++-v3/include/bits/stl_algobase.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/libstdc++-v3/include/bits/stl_algobase.h 
b/libstdc++-v3/include/bits/stl_algobase.h
index d9d1d00b113..b2f5b96d46e 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -500,12 +500,14 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
 __copy_move_a1(_II, _II, _GLIBCXX_STD_C::_Deque_iterator<_Tp, _Tp&, _Tp*>);
 
   template
+__attribute__((__always_inline__))
 _GLIBCXX20_CONSTEXPR
 inline _OI
 __copy_move_a1(_II __first, _II __last, _OI __result)
 { return std::__copy_move_a2<_IsMove>(__first, __last, __result); }
 
   template
+__attribute__((__always_inline__))
 _GLIBCXX20_CONSTEXPR
 inline _OI
 __copy_move_a(_II __first, _II __last, _OI __result)
@@ -757,6 +759,7 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
 #undef _GLIBCXX_ADVANCE
 
   template
+__attribute__((__always_inline__))
 _GLIBCXX20_CONSTEXPR
 inline _BI2
 __copy_move_backward_a1(_BI1 __first, _BI1 __last, _BI2 __result)
@@ -785,6 +788,7 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
_GLIBCXX_STD_C::_Deque_iterator<_Tp, _Tp&, _Tp*>);
 
   template
+__attribute__((__always_inline__))
 _GLIBCXX20_CONSTEXPR
 inline _OI
 __copy_move_backward_a(_II __first, _II __last, _OI __result)
@@ -840,6 +844,7 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
*  that the start of the output range may overlap [first,last).
   */
   template
+__attribute__((__always_inline__))
 _GLIBCXX20_CONSTEXPR
 inline _BI2
 copy_backward(_BI1 __first, _BI1 __last, _BI2 __result)
@@ -875,6 +880,7 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
*  that the start of the output range may overlap [first,last).
   */
   template
+__attribute__((__always_inline__))
 _GLIBCXX20_CONSTEXPR
 inline _BI2
 move_backward(_BI1 __first, _BI1 __last, _BI2 __result)
@@ -958,6 +964,7 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
 }
 
   template
+__attribute__((__always_inline__))
 _GLIBCXX20_CONSTEXPR
 inline void
 __fill_a1(::__gnu_cxx::__normal_iterator<_Ite, _Cont> __first,
@@ -977,6 +984,7 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
const bool&);
 
   template
+__attribute__((__always_inline__))
 _GLIBCXX20_CONSTEXPR
 inline void
 __fill_a(_FIte __first, _FIte __last, const _Tp& __value)
@@ -1002,6 +1010,7 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
*  to @c memset or @c wmemset.
   */
   template
+__attribute__((__always_inline__))
 _GLIBCXX20_CONSTEXPR
 inline void
 fill(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __value)
@@ -1108,6 +1117,7 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
   std::input_iterator_tag);
 
   template
+__attribute__((__always_inline__))
 _GLIBCXX20_CONSTEXPR
 inline _OutputIterator
 __fill_n_a(_OutputIterator __first, _Size __n, const _Tp& __value,
@@ -1120,6 +1130,7 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
 }
 
   template
+__attribute__((__always_inline__))
 _GLIBCXX20_CONSTEXPR
 inline _OutputIterator
 __fill_n_a(_OutputIterator __first, _Size __n, const _Tp& __value,
@@ -1132,6 +1143,7 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
 }
 
   template
+__attribute__((__always_inline__))
 _GLIBCXX20_CONSTEXPR
 inline _OutputIterator
 __fill_n_a(_OutputIterator __first, _Size __n, const _Tp& __value,
@@ -1167,6 +1179,7 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
   // DR 865. More algorithms that throw away information
   // DR 426. search_n(), fill_n(), and generate_n() with negative n
   template
+__attribute__((__always_inline__))
 _GLIBCXX20_CONSTEXPR
 inline _OI
 fill_n(_OI __first, _Size __n, const _Tp& __value)
@@ -1246,6 +1259,7 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
 }
 
   template
+__attribute__((__always_inline__))
 _GLIBCXX20_CONSTEXPR
 inline bool
 __equal_aux(_II1 __first1, _II1 __last1, _II2 __first2)
-- 
2.46.2



Re: [PATCH 1/2] [Middle-end] Canonicalize (vec_merge (fma op2 op1 op3) op1 mask) to (vec_merge (fma op1 op2 op3) op1 mask).

2024-10-15 Thread Richard Sandiford
Richard Biener  writes:
> On Tue, Oct 15, 2024 at 5:30 AM liuhongt  wrote:
>>
>> For x86 masked fma, there're 2 rtl representations
>> 1) (vec_merge (fma op2 op1 op3) op1 mask)
>> 2) (vec_merge (fma op1 op2 op3) op1 mask).
>>
>>  5894(define_insn "_fmadd__mask"
>>  5895  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v")
>>  5896(vec_merge:VFH_AVX512VL
>>  5897  (fma:VFH_AVX512VL
>>  5898(match_operand:VFH_AVX512VL 1 "nonimmediate_operand" "0,0")
>>  5899(match_operand:VFH_AVX512VL 2 "" 
>> ",v")
>>  5900(match_operand:VFH_AVX512VL 3 "" 
>> "v,"))
>>  5901  (match_dup 1)
>>  5902  (match_operand: 4 "register_operand" 
>> "Yk,Yk")))]
>>  5903  "TARGET_AVX512F && "
>>  5904  "@
>>  5905   vfmadd132\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, 
>> %2}
>>  5906   vfmadd213\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, 
>> %3}"
>>  5907  [(set_attr "type" "ssemuladd")
>>  5908   (set_attr "prefix" "evex")
>>  5909   (set_attr "mode" "")])
>>
>> Here op1 has constraint "0", and the scecond op1 is (match_dup 1),
>> we once tried to replace it with (match_operand:M 5
>> "nonimmediate_operand" "0")) to enable more flexibility for pattern
>> match and recog, but it triggered an ICE in reload(reload can handle
>> at most one perand with "0" constraint).
>>
>> So we need either add 2 patterns in the backend or just do the
>> canonicalization in the middle-end.
>
> Nice spot to handle this.  OK with the minor not below fixed
> and in case there are no further comments from CCed folks.
>
> I'll note there's (vec_select (vec_concat ()) as alternate
> way to perform a (vec_merge ...) but I don't feel strongly
> for supporting that alternative without evidence it's used.
> aarch64 seems to use an UNSPEC for masking but it
> seems to have at least two patterns to merge with
> either the first or the third input but failing to handle the
> other (second) operand of a multiplication (*cond_fma_2 and _4);
> as both are "register_operand" I don't see how canonicalization
> works there?

We try to handle that in the expander instead:

/* Swap the multiplication operands if the fallback value is the
   second of the two.  */
if (rtx_equal_p (operands[3], operands[5]))
  std::swap (operands[2], operands[3]);

I suppose that doesn't help if the equivalence is only discovered
by later RTL optimisations.  Hopefully that should be rare though.

> Of course we can't do anything for UNSPECs.

Yeah.  If we were going to represent the SVE operations in generic RTL,
then I think the natural choice would be if_then_else.  vec_merge wouldn't
really work, since the predicates are vector-like rather than integers,
and since the operands can be variable length.

But there were two reasons for not doing that:

(1) It doesn't accurately describe the behaviour of FP operations.
E.g. an if_then_else or vec_merge of an fma does a full-vector fma
and then discards some of the results.  Taken at face value, that
should raise the same exceptions as a plain fma.  The SVE instructions
instead only raise exceptions for active lanes.

(2) For SVE, the predicate is often mandatory.  I was worried that if we
exposed it as generic RTL, simplify-rtx might apply some tricks that
get rid of the condition/predicate, and so get trapped in dead-end
recog attempts that made less progress than the UNSPEC versions.

Thanks,
Richard


Re: [PATCH] warning option for traps (-Wtrap)

2024-10-15 Thread Jakub Jelinek
On Tue, Oct 15, 2024 at 11:50:21AM +0200, Richard Biener wrote:
> > Would it be reasonable to approve this patch now and I try
> > to improve this later?
> 
> On the patch itself:
> 
>  void
>  expand_builtin_trap (void)
>  {
> +  if (warn_trap)
> +{
> +  location_t current_location =

Formatting-wise, = shouldn't be at the end of line.

> +   linemap_unwind_to_first_non_reserved_loc (line_table, 
> input_location,
> + NULL);
> +   warning_at (current_location, OPT_Wtrap, "trap generated");
> +}
> +
>if (targetm.have_trap ())
> 
> this also diagnoses calls the user puts in by calling __builtin_trap (),
> the documentation should probably mention this.  I see the only testcase
> exercises only this path.  I have doubts -fsanitize-trap with any
> sanitizer will ever yield a clean binary, so I wonder about practical
> uses besides very small testcases?

Given that even simple
int foo (int x, int y) { return x + y; }
calls it for -fsanitize=undefined -fsanitize-trap=undefined, and more
importantly, we try not to optimize away sanitizer checks based on VRP and
other optimizations at least to some extent, because VRP and other
optimizations optimize on UB not happening while sanitizers try to catch
the UB, I have serious doubts about the warning.
One would need completely different approach, where we try as much as
possible to prove UB can't happen and only warn if we couldn't prove it
can't.  Still, there would be tons of false positives where things just
aren't inlined and we can't prove UB won't happen, or warnings on dead
code...

Jakub



[PATCH] genmatch: Add selftests to genmatch for diag_vfprintf

2024-10-15 Thread Jakub Jelinek
Hi!

The following patch adds selftests to genmatch to verify the new printing
routine there.
So that I can rely on HAVE_DECL_FMEMOPEN (host test), the tests are done
solely in stage2+ where we link the host libcpp etc. to genmatch.
The tests have been adjusted from pretty-print.cc (test_pp_format),
and I've added to that function two new tests because I've noticed nothing
was testing the %M$.*N$s etc. format specifiers.

Tested on x86_64-linux, ok for trunk?

2024-10-15  Jakub Jelinek  

* configure.ac (gcc_AC_CHECK_DECLS): Add fmemopen.
* configure: Regenerate.
* config.in: Regenerate.
* Makefile.in (build/genmatch.o): Add -DGENMATCH_SELFTESTS to
BUILD_CPPFLAGS for stage2+ genmatch.
* genmatch.cc (test_diag_vfprintf, genmatch_diag_selftests): New
functions.
(main): Call genmatch_diag_selftests.
* pretty-print.cc (test_pp_format): Add two tests, one for %M$.*N$s
and one for %M$.Ns.

--- gcc/configure.ac.jj 2024-09-24 15:14:54.044163018 +0200
+++ gcc/configure.ac2024-10-15 11:06:04.568842228 +0200
@@ -1629,7 +1629,7 @@ gcc_AC_CHECK_DECLS(getenv atol atoll asp
madvise stpcpy strnlen strsignal strverscmp \
strtol strtoul strtoll strtoull setenv unsetenv \
errno snprintf vsnprintf vasprintf malloc realloc calloc \
-   free getopt clock getpagesize ffs gcc_UNLOCKED_FUNCS, , ,[
+   free getopt clock getpagesize ffs fmemopen gcc_UNLOCKED_FUNCS, , ,[
 #include "ansidecl.h"
 #include "system.h"])
 
--- gcc/configure.jj2024-09-24 15:14:54.043163031 +0200
+++ gcc/configure   2024-10-15 11:06:10.606756527 +0200
@@ -12084,7 +12084,7 @@ for ac_func in getenv atol atoll asprint
madvise stpcpy strnlen strsignal strverscmp \
strtol strtoul strtoll strtoull setenv unsetenv \
errno snprintf vsnprintf vasprintf malloc realloc calloc \
-   free getopt clock getpagesize ffs clearerr_unlocked feof_unlocked   
ferror_unlocked fflush_unlocked fgetc_unlocked fgets_unlocked   fileno_unlocked 
fprintf_unlocked fputc_unlocked fputs_unlocked   fread_unlocked fwrite_unlocked 
getchar_unlocked getc_unlocked   putchar_unlocked putc_unlocked
+   free getopt clock getpagesize ffs fmemopen clearerr_unlocked 
feof_unlocked   ferror_unlocked fflush_unlocked fgetc_unlocked fgets_unlocked   
fileno_unlocked fprintf_unlocked fputc_unlocked fputs_unlocked   fread_unlocked 
fwrite_unlocked getchar_unlocked getc_unlocked   putchar_unlocked putc_unlocked
 do
   ac_tr_decl=`$as_echo "HAVE_DECL_$ac_func" | $as_tr_cpp`
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $ac_func is 
declared" >&5
--- gcc/config.in.jj2024-07-31 21:47:22.637999164 +0200
+++ gcc/config.in   2024-10-15 11:06:13.153720379 +0200
@@ -1018,6 +1018,13 @@
 #endif
 
 
+/* Define to 1 if we found a declaration for 'fmemopen', otherwise define to
+   0. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_DECL_FMEMOPEN
+#endif
+
+
 /* Define to 1 if we found a declaration for 'fprintf_unlocked', otherwise
define to 0. */
 #ifndef USED_FOR_TARGET
--- gcc/Makefile.in.jj  2024-10-14 19:40:46.989958038 +0200
+++ gcc/Makefile.in 2024-10-15 10:56:52.027684998 +0200
@@ -3143,6 +3143,7 @@ else
 BUILD_CPPLIB = $(CPPLIB) $(LIBIBERTY)
 build/genmatch$(build_exeext): BUILD_LIBDEPS += $(LIBINTL_DEP) $(LIBICONV_DEP)
 build/genmatch$(build_exeext): BUILD_LIBS += $(LIBINTL) $(LIBICONV)
+build/genmatch.o: BUILD_CPPFLAGS += -DGENMATCH_SELFTESTS
 endif
 
 build/genmatch$(build_exeext) : $(BUILD_CPPLIB) \
--- gcc/genmatch.cc.jj  2024-10-14 19:40:46.990958024 +0200
+++ gcc/genmatch.cc 2024-10-15 11:33:54.521110736 +0200
@@ -584,6 +584,138 @@ diag_vfprintf (FILE *f, int err_no, cons
   fprintf (f, "%s", q);
 }
 
+#if defined(GENMATCH_SELFTESTS) && defined(HAVE_DECL_FMEMOPEN)
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wsuggest-attribute=format"
+
+static void
+test_diag_vfprintf (const char *expected, const char *msg, ...)
+{
+  char buf[256];
+  va_list ap;
+  FILE *f = fmemopen (buf, 256, "w");
+  gcc_assert (f != NULL);
+  va_start (ap, msg);
+  diag_vfprintf (f, 0, msg, &ap);
+  va_end (ap);
+  fclose (f);
+  gcc_assert (strcmp (buf, expected) == 0);
+}
+
+#pragma GCC diagnostic pop
+
+static void
+genmatch_diag_selftests (void)
+{
+  /* Verify that plain text is passed through unchanged.  */
+  test_diag_vfprintf ("unformatted", "unformatted");
+
+  /* Verify various individual format codes, in the order listed in the
+ comment for pp_format above.  For each code, we append a second
+ argument with a known bit pattern (0x12345678), to ensure that we
+ are consuming arguments correctly.  */
+  test_diag_vfprintf ("-27 12345678", "%d %x", -27, 0x12345678);
+  test_diag_vfprintf ("-5 12345678", "%i %x", -5, 0x12345678);
+  test_diag_vfprintf ("10 12345678", "%u %x", 10, 0x12345678);
+  test_diag_vfprintf ("17 12345678", "%o %x", 15, 0x12345678);
+  test_diag_vfprintf ("cafebabe 12345678", "%x %x", 0xcaf

Re: [PATCH] warning option for traps (-Wtrap)

2024-10-15 Thread Richard Biener
On Sun, 13 Oct 2024, Martin Uecker wrote:

> Am Sonntag, dem 13.10.2024 um 10:56 +0200 schrieb Richard Biener:
> > On Sat, 12 Oct 2024, Martin Uecker wrote:
> > 
> > > Am Samstag, dem 12.10.2024 um 18:44 +0200 schrieb Richard Biener:
> > > > 
> > > > > Am 12.10.2024 um 16:43 schrieb Martin Uecker :
> > > > > 
> > > > > 
> > > > > There is code which should not fail at run-time.  For this,
> > > > > it is helpful to get a warning when a compiler inserts traps
> > > > > (e.g. sanitizers, hardbools, __builtin_trap(), etc.).
> > > > > 
> > > > > Having a warning for this also has many other use cases, e.g.
> > > > > one can use it with some sanitizer to rule out that some
> > > > > piece of code has certain undefined behavior such as
> > > > > signed overflow or undefined behavior in left-shifts
> > > > > (one gets a warning if the optimizer does not prove the
> > > > > trap is dead and it is emitted).
> > > > > 
> > > > > Another potential use case could be writing tests.
> > > > > 
> > > > > 
> > > > > Bootstrapped and regression tested on x64_84.
> > > > > 
> > > > > 
> > > > >Add warning option that warns when a trap is generated.
> > > > > 
> > > > >This adds a warning option -Wtrap that is emitted in
> > > > >expand_builtin_trap.  It can be used to verify that traps
> > > > >are generated or - more importantly - are not generated
> > > > >under various conditions, e.g. for UBSan with -fsanitize-trap,
> > > > >hardbools, etc.
> > > > 
> > > > Isn’t it better to diagnose with more context from the callers that 
> > > > insert the trap?
> > > 
> > > More context would be better.  Should there be additional
> > > arguments when creating the call to the builtin?
> > 
> > Why not diagnose when we create the call? 
> 
> I agree that having optional warnings for all situation where there
> could be run-time UB (or a trap) would be useful.  But having a
> generic warning for all such situations would produce many warnings
> and also cover cases where we already have more specific warnings.
> 
> Doing it when the trap is generated directly gives me somewhat
> different information that I sometimes need: Is there a trap left
> in the generated binary?
> 
> We have a similar warning already for generating trampolines.
> 
> Before adding the warning to my local tree, I often looked at the
> generated assembly to look for  generated "ud2" instructions.  But this
> is painful and gives me even less context.
> 
> A practical example from failing to properly take integer
> promotions into account (adapted from a old bug in a crypto
> library) is:
> 
> uint32_t bar(int N, unsigned char key)
> {
> unsigned int kappa = key << 24;
> return kappa;
> }
> 
> which has UB that the warning tells me about and 
> where adding a cast is required to eliminate it:
> https://godbolt.org/z/osvEsdcqc
> 
> 
> >  But sure, adding a diagnostic
> > argument would work, it might also work to distinguish calls we want to
> > diagnose from those we don't.
> 
> Would it be reasonable to approve this patch now and I try
> to improve this later?

On the patch itself:

 void
 expand_builtin_trap (void)
 {
+  if (warn_trap)
+{
+  location_t current_location =
+   linemap_unwind_to_first_non_reserved_loc (line_table, 
input_location,
+ NULL);
+   warning_at (current_location, OPT_Wtrap, "trap generated");
+}
+
   if (targetm.have_trap ())

this also diagnoses calls the user puts in by calling __builtin_trap (),
the documentation should probably mention this.  I see the only testcase
exercises only this path.  I have doubts -fsanitize-trap with any
sanitizer will ever yield a clean binary, so I wonder about practical
uses besides very small testcases?

Is there any reason to use linemap_unwind_to_first_non_reserved_loc
here?

Thanks,
Richard.

Re: [PATCH 1/4]middle-end: support multi-step zero-extends using VEC_PERM_EXPR

2024-10-15 Thread Tamar Christina
Hi,

Thanks for the look,

The 10/15/2024 09:54, Richard Biener wrote:
> On Mon, 14 Oct 2024, Tamar Christina wrote:
> 
> > Hi All,
> > 
> > This patch series adds support for a target to do a direct convertion for 
> > zero
> > extends using permutes.
> > 
> > To do this it uses a target hook use_permute_for_promotio which must be
> > implemented by targets.  This hook is used to indicate:
> > 
> >  1. can a target do this for the given modes.
> 
> can_vec_perm_const_p?
> 
> >  3. can the target convert between various vector modes with a VIEW_CONVERT.
> 
> We have modes_tieable_p for this I think.
> 

Yes, though the reason I didn't use either of them was because they are 
reporting
a capability of the backend.  In which case the hook, which is already backend
specific already should answer these two.

I initially had these checks there, but they didn't seem to add value, for
promotions the masks are only dependent on the input and output modes. So they 
really
don't change.

When you have say a loop that does lots of conversions from say char to int, it 
seemed
like a waste to retest the same permute constants over and over again.

I can add them back in if you prefer...

> >  2. is it profitable for the target to do it.
> 
> So you say the target can do both ways but both zip and tbl are
> permute instructions so I really fail to see the point and why
> the target itself doesn't choose to use tbl for unpack.
> 
> Is the intent in the end to have VEC_PERM in the IL rather than
> VEC_UNPACK_* so it combines with other VEC_PERMs?
> 

Yes, and this happens quite often, e.g. load permutes or lane shuffles etc.
The reason for exposing them as VEC_PERM was to trigger further optimizations.

If you remember the ticket about LOAD_LANES, with this optimization and an open
encoding of LOAD_LANES we stop using it in cases where theres a zero extend 
after
the LOAD_LANES, because then you're doing effectively two permutes and the 
LOAD_LANES
is no longer beneficial. There are other examples, load and replicate etc.

> That said, I'm not against supporting VEC_PERM code gen from
> unsigned promotion but I don't see why we should do this when
> the target advertises VEC_UNPACK_* support or direct conversion
> support?
> 
> Esp. with adding a "local" cost related hook which cannot take
> into accout context.
> 

To summarize a long story:

  yes I open encode zero extends as permutes to allow further optimizations.  
One could convert
  vec_unpacks to convert optabs and use that, but that is an opague value that 
can't be further
  optimized.

  The hook isn't really a costing thing in the general sense. It's literally 
just "do you want
  permutes yes or no".  The reason it gets the modes is simply that I don't 
think a single level
  extend is worth it, but I can just change it to never try to do this on more 
than one level.

I think think there's a lot of merrit in open-encoding zero extends, but one 
reason this is
beneficial on AArch64 for instance is that we can consume the zero register and 
rewrite the
indices to a single register TBL.  Two registers TBLs are slower on some 
implementations.

Thanks,
Tamar

> > Using permutations have a big benefit of multi-step zero extensions because 
> > they
> > both reduce the number of needed instructions, but also increase throughput 
> > as
> > the dependency chain is removed.
> > 
> > Concretely on AArch64 this changes:
> > 
> > void test4(unsigned char *x, long long *y, int n) {
> > for(int i = 0; i < n; i++) {
> > y[i] = x[i];
> > }
> > }
> > 
> > from generating:
> > 
> > .L4:
> > ldr q30, [x4], 16
> > add x3, x3, 128
> > zip1v1.16b, v30.16b, v31.16b
> > zip2v30.16b, v30.16b, v31.16b
> > zip1v2.8h, v1.8h, v31.8h
> > zip1v0.8h, v30.8h, v31.8h
> > zip2v1.8h, v1.8h, v31.8h
> > zip2v30.8h, v30.8h, v31.8h
> > zip1v26.4s, v2.4s, v31.4s
> > zip1v29.4s, v0.4s, v31.4s
> > zip1v28.4s, v1.4s, v31.4s
> > zip1v27.4s, v30.4s, v31.4s
> > zip2v2.4s, v2.4s, v31.4s
> > zip2v0.4s, v0.4s, v31.4s
> > zip2v1.4s, v1.4s, v31.4s
> > zip2v30.4s, v30.4s, v31.4s
> > stp q26, q2, [x3, -128]
> > stp q28, q1, [x3, -96]
> > stp q29, q0, [x3, -64]
> > stp q27, q30, [x3, -32]
> > cmp x4, x5
> > bne .L4
> > 
> > and instead we get:
> > 
> > .L4:
> > add x3, x3, 128
> > ldr q23, [x4], 16
> > tbl v5.16b, {v23.16b}, v31.16b
> > tbl v4.16b, {v23.16b}, v30.16b
> > tbl v3.16b, {v23.16b}, v29.16b
> > tbl v2.16b, {v23.16b}, v28.16b
> > tbl v1.16b, {v23.16b}, v27.16b
> > tbl v0.16b, {v23.16b}, v26.16b
> > tbl v22.16b, {v23.16b}, v25.16b
> > tbl v23.16b, {v23.16b}, v24.16b
> > stp q5, q4, [x3, -128]
> > 

Re: [PATCH v13 0/4] c: Add __lengthof__ operator

2024-10-15 Thread Alejandro Colomar
Hi Joseph,

On Wed, Oct 09, 2024 at 09:11:52PM GMT, Joseph Myers wrote:
> On Wed, 9 Oct 2024, Alejandro Colomar wrote:
> 
> > Every little bit adds up.  Documentation is simpler if there is naming
> > consistency.  We have SYNOPSISes in the man pages, and they're up front,
> > because they constitute an important part of the documentation.
> 
> We also have a convention for future standard C interfaces to put the 
> length before the pointer so that a VLA parameter declaration can be used 
> that makes very clear the intent for how many elements the array has, 
> which seems much better for that purpose than relying on the name of a 
> parameter.

Just as a confirmation of what I already said: none of the arguments
convince me.  They seem mitigations to the damage that overloading the
term length can do.

I stand on my proposal of either __nelementsof__(), __countof__() (with
no preference), any derivative of those, or almost anything that doesn't
derive from "length" (well, I also veto "dimension", "extent", and
"range", for different reasons, but anything else is fair game).

If you want _Lengthof, please sed(1) it yourself and sign the patch
below my signature.  I don't think you (or myself) can convince me of
changing my mind, so it's up to you to decide what you want to do.

I think it would be good to have this in GCC 15, so if you're convinced
of _Lengthof(), please go ahead already.  I don't think delaying this
further will change the mind of any of us.


Have a lovely day!
Alex

-- 



signature.asc
Description: PGP signature


Re: libstdc++ fetch_add & fenv -- ecosystem questions

2024-10-15 Thread Matthew Malcomson
Thanks for the pointer — always linking with libatomic by default 
using--as-needed sounds quite promising from my end.

I am not certain, but suspect that needing libatomic for 
atomic::fetch_{add,sub} would not mean libstdc++.so would get a 
DT_NEEDED for libatomic.
The place where the new builtin would go is in a header that the user code 
would include (and would be inlined completely).
>From a quick grep it appears this functionality in this header is not 
>currently used in the libstdc++.so library itself.
Not 100% confident on that yet — but enough to feel this approach is promising.
I guess if that is the case, then using the same hypothetical driver flag 
suggested in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81358#c9 for linking 
libstdc++ seems like it could avoid adding the requirement without realising.

>From reading the comments in that bugzilla it doesn't look like there's any 
>objection to implementing that.
Seems that the testsuite and build system are the main things to watch out for 
here right?

Does that seem reasonable to others?

From: Joseph Myers 
Sent: 14 October 2024 6:47 PM
To: Matthew Malcomson 
Cc: Jonathan Wakely ; gcc-patches@gcc.gnu.org 

Subject: Re: libstdc++ fetch_add & fenv -- ecosystem questions

External email: Use caution opening links or attachments


On Mon, 14 Oct 2024, Matthew Malcomson wrote:

>   4. __atomic_feraiseexcept should be a builtin to avoid previously
> unnecessary requirement to link libatomic.

libatomic should be linked by default (with --as-needed); see bug 81358.
But if your concern is e.g. libstdc++.so having DT_NEEDED for libatomic,
that might not suffice.

__atomic_feraiseexcept is a bit long to expand inline all the time (it's
supposed to ensure that exceptions are raised including taking any enabled
traps, so it's not just manipulating bits in the floating-point status
register).  On the other hand, if you're only concerned with one
operation, only a subset of the checks in __atomic_feraiseexcept are
relevant (for example, addition and subtraction can't raise FE_DIVBYZERO;
and except for the exact underflow case which __atomic_feraiseexcept
doesn't handle anyway, they can't raise FE_UNDERFLOW).

--
Joseph S. Myers
josmy...@redhat.com



Re: Fortran test typebound_operator_7.f03 broken by non-Fortran commit. Confirm anyone?

2024-10-15 Thread Thomas Schwinge
Hi!

On 2024-10-14T21:18:17+0100, Sam James  wrote:
> Sam James  writes:
>> Andre Vehreschild  writes:
>>> [...] During latest regression testing of the Fortran suite I got
>>> typebound_operator_7.f03 failing with:
>>>
>>> typebound_operator_7.f03:94:25:
>>>
>>>94 |   u = (u*2.0*4.0) + u*4.0
>>>   | 1
>>> internal compiler error: tree check: expected function_decl, have 
>>> indirect_ref
>>>in DECL_FUNCTION_CODE, at tree.h:4329 0x3642f3e internal_error(char 
>>> const*,
>>>...) /mnt/work_store/gcc/gcc.test/gcc/diagnostic-global-context.cc:517
>>> 0x1c0a703 tree_check_failed(tree_node const*, char const*, int, char const*,
>>>...) /mnt/work_store/gcc/gcc.test/gcc/tree.cc:9003
>>> 0xeb9150 tree_check(tree_node const*, char const*, int, char const*, 
>>> tree_code)
>>> /mnt/work_store/gcc/gcc.test/gcc/tree.h:3921
>>> 0xf5725b DECL_FUNCTION_CODE(tree_node const*)
>>> /mnt/work_store/gcc/gcc.test/gcc/tree.h:4329
>>> 0xf383d6 update_builtin_function
>>> /mnt/work_store/gcc/gcc.test/gcc/fortran/trans-expr.cc:4405
>>> 0xf468b9 gfc_conv_procedure_call(gfc_se*, gfc_symbol*, gfc_actual_arglist*,
>>>gfc_expr*, vec*)
>>>/mnt/work_store/gcc/gcc.test/gcc/fortran/trans-expr.cc:8236 0xf48b0f
>>>gfc_conv_function_expr
>>>/mnt/work_store/gcc/gcc.test/gcc/fortran/trans-expr.cc:8815 0xf4ceda
>>>gfc_conv_expr(gfc_se*, gfc_expr*)
>>>/mnt/work_store/gcc/gcc.test/gcc/fortran/trans-expr.cc:9982 0xf40777
>>>gfc_conv_procedure_call(gfc_se*, gfc_symbol*, gfc_actual_arglist*,
>>>gfc_expr*, vec*)
>>>/mnt/work_store/gcc/gcc.test/gcc/fortran/trans-expr.cc:6816 0xf48b0f
>>>gfc_conv_function_expr
>>>/mnt/work_store/gcc/gcc.test/gcc/fortran/trans-expr.cc:8815 0xf4ceda
>>>gfc_conv_expr(gfc_se*, gfc_expr*)
>>>/mnt/work_store/gcc/gcc.test/gcc/fortran/trans-expr.cc:9982 0xf40777
>>>gfc_conv_procedure_call(gfc_se*, gfc_symbol*, gfc_actual_arglist*,
>>>gfc_expr*, vec*)
>>>/mnt/work_store/gcc/gcc.test/gcc/fortran/trans-expr.cc:6816 0xfb580a
>>>gfc_trans_call(gfc_code*, bool, tree_node*, tree_node*, bool)
>>>/mnt/work_store/gcc/gcc.test/gcc/fortran/trans-stmt.cc:425 0xed9363
>>>trans_code /mnt/work_store/gcc/gcc.test/gcc/fortran/trans.cc:2434 
>>> 0xed97d5
>>>gfc_trans_code(gfc_code*)
>>>/mnt/work_store/gcc/gcc.test/gcc/fortran/trans.cc:2713 0xf26342
>>>gfc_generate_function_code(gfc_namespace*)
>>>/mnt/work_store/gcc/gcc.test/gcc/fortran/trans-decl.cc:7958 0xed9819
>>>gfc_generate_code(gfc_namespace*)
>>>/mnt/work_store/gcc/gcc.test/gcc/fortran/trans.cc:2730 0xe544ee
>>>translate_all_program_units
>>>/mnt/work_store/gcc/gcc.test/gcc/fortran/parse.cc:7156 0xe54e23
>>>gfc_parse_file() /mnt/work_store/gcc/gcc.test/gcc/fortran/parse.cc:7473
>>>0xebf7ce gfc_be_parse_file
>>>/mnt/work_store/gcc/gcc.test/gcc/fortran/f95-lang.cc:241

>> Tobias Burnus (1):
>>   Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' 
>> __builtin_is_initial_device

> Indeed: https://gcc.gnu.org/PR117136.

On 2024-10-13T10:21:01+0200, Tobias Burnus  wrote:
> Now pushed as r15-4298-g3269a722b7a036.

> --- a/gcc/fortran/trans-expr.cc
> +++ b/gcc/fortran/trans-expr.cc

>  static void
> -conv_function_val (gfc_se * se, gfc_symbol * sym, gfc_expr * expr,
> -gfc_actual_arglist *actual_args)
> +conv_function_val (gfc_se * se, bool *is_builtin, gfc_symbol * sym,
> +gfc_expr * expr, gfc_actual_arglist *actual_args)
>  {
>tree tmp;
>  
> +  *is_builtin = false;
> [...]

Unconditionally initializes '*is_builtin' here...

> @@ -6324,6 +6366,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
>gfc_actual_arglist *arg;
>int has_alternate_specifier = 0;
>bool need_interface_mapping;
> +  bool is_builtin;
>bool callee_alloc;
>bool ulim_copy;
>gfc_typespec ts;
> @@ -8164,7 +8207,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
>  
>/* Generate the actual call.  */
>if (base_object == NULL_TREE)
> -conv_function_val (se, sym, expr, args);
> +conv_function_val (se, &is_builtin, sym, expr, args);
>else
>  conv_base_obj_fcn_val (se, base_object, expr);
>  
> @@ -8189,6 +8232,9 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
>fntype = TREE_TYPE (TREE_TYPE (se->expr));
>se->expr = build_call_vec (TREE_TYPE (fntype), se->expr, arglist);
>  
> +  if (is_builtin)
> +se->expr = update_builtin_function (se->expr, sym);
> +
>/* Allocatable scalar function results must be freed and nullified
>   after use. This necessitates the creation of a temporary to
>   hold the result to prevent duplicate calls.  */

..., however: 'conv_function_val' is not always called here, and
therefore 'is_builtin' not always initialized, giving rise to
PR117136 "[15 regression] ICE for gfortran.dg/typebound_operator_11.f90 since 
r15-4298-g3269a722b7a036".
Based on Harald's analysis and patch, I've pushed to 

Re: [PATCH 1/4]middle-end: support multi-step zero-extends using VEC_PERM_EXPR

2024-10-15 Thread Richard Biener
On Mon, 14 Oct 2024, Tamar Christina wrote:

> Hi All,
> 
> This patch series adds support for a target to do a direct convertion for zero
> extends using permutes.
> 
> To do this it uses a target hook use_permute_for_promotio which must be
> implemented by targets.  This hook is used to indicate:
> 
>  1. can a target do this for the given modes.

can_vec_perm_const_p?

>  2. is it profitable for the target to do it.

So you say the target can do both ways but both zip and tbl are
permute instructions so I really fail to see the point and why
the target itself doesn't choose to use tbl for unpack.

Is the intent in the end to have VEC_PERM in the IL rather than
VEC_UNPACK_* so it combines with other VEC_PERMs?

That said, I'm not against supporting VEC_PERM code gen from
unsigned promotion but I don't see why we should do this when
the target advertises VEC_UNPACK_* support or direct conversion
support?

Esp. with adding a "local" cost related hook which cannot take
into accout context.

>  3. can the target convert between various vector modes with a VIEW_CONVERT.

We have modes_tieable_p for this I think.

> Using permutations have a big benefit of multi-step zero extensions because 
> they
> both reduce the number of needed instructions, but also increase throughput as
> the dependency chain is removed.
> 
> Concretely on AArch64 this changes:
> 
> void test4(unsigned char *x, long long *y, int n) {
> for(int i = 0; i < n; i++) {
> y[i] = x[i];
> }
> }
> 
> from generating:
> 
> .L4:
> ldr q30, [x4], 16
> add x3, x3, 128
> zip1v1.16b, v30.16b, v31.16b
> zip2v30.16b, v30.16b, v31.16b
> zip1v2.8h, v1.8h, v31.8h
> zip1v0.8h, v30.8h, v31.8h
> zip2v1.8h, v1.8h, v31.8h
> zip2v30.8h, v30.8h, v31.8h
> zip1v26.4s, v2.4s, v31.4s
> zip1v29.4s, v0.4s, v31.4s
> zip1v28.4s, v1.4s, v31.4s
> zip1v27.4s, v30.4s, v31.4s
> zip2v2.4s, v2.4s, v31.4s
> zip2v0.4s, v0.4s, v31.4s
> zip2v1.4s, v1.4s, v31.4s
> zip2v30.4s, v30.4s, v31.4s
> stp q26, q2, [x3, -128]
> stp q28, q1, [x3, -96]
> stp q29, q0, [x3, -64]
> stp q27, q30, [x3, -32]
> cmp x4, x5
> bne .L4
> 
> and instead we get:
> 
> .L4:
> add x3, x3, 128
> ldr q23, [x4], 16
> tbl v5.16b, {v23.16b}, v31.16b
> tbl v4.16b, {v23.16b}, v30.16b
> tbl v3.16b, {v23.16b}, v29.16b
> tbl v2.16b, {v23.16b}, v28.16b
> tbl v1.16b, {v23.16b}, v27.16b
> tbl v0.16b, {v23.16b}, v26.16b
> tbl v22.16b, {v23.16b}, v25.16b
> tbl v23.16b, {v23.16b}, v24.16b
> stp q5, q4, [x3, -128]
> stp q3, q2, [x3, -96]
> stp q1, q0, [x3, -64]
> stp q22, q23, [x3, -32]
> cmp x4, x5
> bne .L4
> 
> Tests are added in the AArch64 patch introducing the hook.  The testsuite also
> already had about 800 runtime tests that get affected by this.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf,
> x86_64-pc-linux-gnu -m32, -m64 and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * target.def (use_permute_for_promotion): New.
>   * doc/tm.texi.in: Document it.
>   * doc/tm.texi: Regenerate.
>   * targhooks.cc (default_use_permute_for_promotion): New.
>   * targhooks.h (default_use_permute_for_promotion): New.
>   (vectorizable_conversion): Support direct convertion with permute.
>   * tree-vect-stmts.cc (vect_create_vectorized_promotion_stmts): Likewise.
>   (supportable_widening_operation): Likewise.
>   (vect_gen_perm_mask_any): Allow vector permutes where input registers
>   are half the width of the result per the GCC 14 relaxation of
>   VEC_PERM_EXPR.
> 
> ---
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 
> 4deb3d2c283a2964972b94f434370a6f57ea816a..e8192590ac14005bf7cb5f731c16ee7eacb78143
>  100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -6480,6 +6480,15 @@ type @code{internal_fn}) should be considered 
> expensive when the mask is
>  all zeros.  GCC can then try to branch around the instruction instead.
>  @end deftypefn
>  
> +@deftypefn {Target Hook} bool TARGET_VECTORIZE_USE_PERMUTE_FOR_PROMOTION 
> (const_tree @var{in_type}, const_tree @var{out_type})
> +This hook returns true if the operation promoting @var{in_type} to
> +@var{out_type} should be done as a vector permute.  If @var{out_type} is
> +a signed type the operation will be done as the related unsigned type and
> +converted to @var{out_type}.  If the target supports the needed permute,
> +is able to convert unsigned(@var{out_type}) to @var{out_type} and it is
> +beneficial to the hook should return true, else false should be returned.
> +@end deftypefn
> +

[PATCH] testsuite: Simplify target test and dg-options for AMO tests

2024-10-15 Thread jeevitha
Hi All,

Removed powerpc*-*-* from the target test as it is always true. Simplified
options by removing -mpower9-misc and -mvsx, which are enabled by default with
-mdejagnu-cpu=power9. The has_arch_pwr9 check is also true with
-mdejagnu-cpu=power9, so it has been removed.

2024-10-15 Jeevitha Palanisamy 

gcc/testsuite/

* gcc.target/powerpc/amo1.c: Removed powerpc*-*-* from the target and
simplified dg-options.
* gcc.target/powerpc/amo2.c: Simplified dg-options and added powerpc_vsx
target check.


diff --git a/gcc/testsuite/gcc.target/powerpc/amo1.c 
b/gcc/testsuite/gcc.target/powerpc/amo1.c
index c5af373b4e9..9a981cd4219 100644
--- a/gcc/testsuite/gcc.target/powerpc/amo1.c
+++ b/gcc/testsuite/gcc.target/powerpc/amo1.c
@@ -1,6 +1,5 @@
-/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
-/* { dg-options "-mvsx -mpower9-misc -O2" } */
-/* { dg-additional-options "-mdejagnu-cpu=power9" { target { ! has_arch_pwr9 } 
} } */
+/* { dg-do compile { target { lp64 } } } */
+/* { dg-options "-mdejagnu-cpu=power9 -O2" } */
 /* { dg-require-effective-target powerpc_vsx } */
 
 /* Verify P9 atomic memory operations.  */
diff --git a/gcc/testsuite/gcc.target/powerpc/amo2.c 
b/gcc/testsuite/gcc.target/powerpc/amo2.c
index 592f0fb3f92..9e4ff0ce064 100644
--- a/gcc/testsuite/gcc.target/powerpc/amo2.c
+++ b/gcc/testsuite/gcc.target/powerpc/amo2.c
@@ -1,6 +1,6 @@
 /* { dg-do run { target { powerpc*-*-linux* && { lp64 && p9vector_hw } } } } */
-/* { dg-options "-O2 -mvsx -mpower9-misc" } */
-/* { dg-additional-options "-mdejagnu-cpu=power9" { target { ! has_arch_pwr9 } 
} } */
+/* { dg-options "-mdejagnu-cpu=power9 -O2" } */
+/* { dg-require-effective-target powerpc_vsx } */
 
 #include 
 #include 





[PATCH] SVE intrinsics: Add fold_active_lanes_to method to refactor svmul and svdiv.

2024-10-15 Thread Jennifer Schmitz
As suggested in
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663275.html,
this patch adds the method gimple_folder::fold_active_lanes_to (tree X).
This method folds active lanes to X and sets inactive lanes according to
the predication, returning a new gimple statement. That makes folding of
SVE intrinsics easier and reduces code duplication in the
svxxx_impl::fold implementations.
Using this new method, svdiv_impl::fold and svmul_impl::fold were refactored.
Additionally, the method was used for two optimizations:
1) Fold svdiv to the dividend, if the divisor is all ones and
2) for svmul, if one of the operands is all ones, fold to the other operand.
Both optimizations were previously applied to _x and _m predication on
the RTL level, but not for _z, where svdiv/svmul were still being used.
For both optimization, codegen was improved by this patch, for example by
skipping sel instructions with all-same operands and replacing sel
instructions by mov instructions.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz 

gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
Refactor using fold_active_lanes_to and fold to dividend, is the
divisor is all ones.
(svmul_impl::fold): Refactor using fold_active_lanes_to and fold
to the other operand, if one of the operands is all ones.
* config/aarch64/aarch64-sve-builtins.h: Declare
gimple_folder::fold_active_lanes_to (tree).
* config/aarch64/aarch64-sve-builtins.cc
(gimple_folder::fold_actives_lanes_to): Add new method to fold
actives lanes to given argument and setting inactives lanes
according to the predication.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/div_s32.c: Adjust expected outcome.
* gcc.target/aarch64/sve/acle/asm/div_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise.
* gcc.target/aarch64/sve/fold_div_zero.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/mul_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
* gcc.target/aarch64/sve/mul_const_run.c: Likewise.
---
 .../aarch64/aarch64-sve-builtins-base.cc  | 39 -
 gcc/config/aarch64/aarch64-sve-builtins.cc| 27 
 gcc/config/aarch64/aarch64-sve-builtins.h |  1 +
 .../gcc.target/aarch64/sve/acle/asm/div_s32.c | 13 +++---
 .../gcc.target/aarch64/sve/acle/asm/div_s64.c | 13 +++---
 .../gcc.target/aarch64/sve/acle/asm/div_u32.c | 13 +++---
 .../gcc.target/aarch64/sve/acle/asm/div_u64.c | 13 +++---
 .../gcc.target/aarch64/sve/acle/asm/mul_s16.c | 43 +--
 .../gcc.target/aarch64/sve/acle/asm/mul_s32.c | 43 +--
 .../gcc.target/aarch64/sve/acle/asm/mul_s64.c | 43 +--
 .../gcc.target/aarch64/sve/acle/asm/mul_s8.c  | 43 +--
 .../gcc.target/aarch64/sve/acle/asm/mul_u16.c | 43 +--
 .../gcc.target/aarch64/sve/acle/asm/mul_u32.c | 43 +--
 .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 43 +--
 .../gcc.target/aarch64/sve/acle/asm/mul_u8.c  | 43 +--
 .../gcc.target/aarch64/sve/fold_div_zero.c| 12 ++
 .../gcc.target/aarch64/sve/mul_const_run.c|  6 +++
 17 files changed, 387 insertions(+), 94 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 1c17149e1f0..70bd83005d7 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -758,18 +758,15 @@ public:
 if (auto *res = f.fold_const_binary (TRUNC_DIV_EXPR))
   return res;
 
-/* If the dividend is all zeros, fold to zero vector.  */
+/* If the divisor is all ones, fold to dividend.  */
 tree op1 = gimple_call_arg (f.call, 1);
-if (integer_zerop (op1))
-  return gimple_build_assign (f.lhs, op1);
-
-/* If the divisor is all zeros, fold to zero vector.  */
-tree pg = gimple_call_arg (f.call, 0);
 tree op2 = gimple_call_arg (f.call, 2);
-if (integer_zerop (op2)
-   && (f.pred != PRED_m
-   || is_ptrue (pg, f.type_suffix (0).element_bytes)))
-  return gimple_build_assign (f.lhs, build_zero_cst (TREE_TYPE (f.lhs)));
+if (integer_onep (op2))
+  return f.fold_active_lanes_to (op1);
+
+/* If one of the operands is all zeros, fold to zero vector.  */
+if (integer_zerop (op1) || i

Re: [PATCH v2] alpha: Add -mlra option

2024-10-15 Thread Uros Bizjak
On Tue, Oct 15, 2024 at 11:09 AM John Paul Adrian Glaubitz
 wrote:
>
> PR target/66207
> * config/alpha/alpha.opt (mlra): New target option.
> * config/alpha/alpha.cc (alpha_use_lra_p): New function.
> (TARGET_LRA_P): Use it.
> * config/alpha/alpha.opt.urls: Regenerate.

IMO, we should simply deprecate non-BWX targets. If reload is going
away, then there is no way for non-BWX targets to access reload
internals they require for compilation. As mentioned in the PR,
non-BWX targets are removed from distros anyway, so I guess there is
no point to invest much time to modernize them,

Uros.

>
> Signed-off-by: John Paul Adrian Glaubitz 
> ---
>  gcc/config/alpha/alpha.cc   | 10 +-
>  gcc/config/alpha/alpha.opt  |  5 +
>  gcc/config/alpha/alpha.opt.urls |  2 ++
>  3 files changed, 16 insertions(+), 1 deletion(-)
>
> v2:
> - Rephrase patch short summary
>
> diff --git a/gcc/config/alpha/alpha.cc b/gcc/config/alpha/alpha.cc
> index 74631a41693..218c66b6090 100644
> --- a/gcc/config/alpha/alpha.cc
> +++ b/gcc/config/alpha/alpha.cc
> @@ -9941,6 +9941,14 @@ alpha_c_mode_for_floating_type (enum tree_index ti)
>return default_mode_for_floating_type (ti);
>  }
>
> +/* Implement TARGET_LRA_P.  */
> +
> +static bool
> +alpha_use_lra_p ()
> +{
> +  return alpha_lra_p;
> +}
> +
>  /* Initialize the GCC target structure.  */
>  #if TARGET_ABI_OPEN_VMS
>  # undef TARGET_ATTRIBUTE_TABLE
> @@ -10124,7 +10132,7 @@ alpha_c_mode_for_floating_type (enum tree_index ti)
>  #endif
>
>  #undef TARGET_LRA_P
> -#define TARGET_LRA_P hook_bool_void_false
> +#define TARGET_LRA_P alpha_use_lra_p
>
>  #undef TARGET_LEGITIMATE_ADDRESS_P
>  #define TARGET_LEGITIMATE_ADDRESS_P alpha_legitimate_address_p
> diff --git a/gcc/config/alpha/alpha.opt b/gcc/config/alpha/alpha.opt
> index 62543d2689c..a4d6d58724a 100644
> --- a/gcc/config/alpha/alpha.opt
> +++ b/gcc/config/alpha/alpha.opt
> @@ -89,6 +89,11 @@ mlarge-text
>  Target RejectNegative InverseMask(SMALL_TEXT)
>  Emit indirect branches to local functions.
>
> +mlra
> +Target Var(alpha_lra_p) Undocumented
> +Usa LRA for reload instead of the old reload framework.  This option is
> +experimental, and it may be removed in future versions of the compiler.
> +
>  mtls-kernel
>  Target Mask(TLS_KERNEL)
>  Emit rdval instead of rduniq for thread pointer.
> diff --git a/gcc/config/alpha/alpha.opt.urls b/gcc/config/alpha/alpha.opt.urls
> index a55c08328c3..916a3013f63 100644
> --- a/gcc/config/alpha/alpha.opt.urls
> +++ b/gcc/config/alpha/alpha.opt.urls
> @@ -44,6 +44,8 @@ UrlSuffix(gcc/DEC-Alpha-Options.html#index-msmall-data)
>  mlarge-data
>  UrlSuffix(gcc/DEC-Alpha-Options.html#index-mlarge-data)
>
> +; skipping UrlSuffix for 'mlra' due to finding no URLs
> +
>  msmall-text
>  UrlSuffix(gcc/DEC-Alpha-Options.html#index-msmall-text)
>
> --
> 2.39.5
>


[PATCH] SVE intrinsics: Fold division and multiplication by -1 to neg.

2024-10-15 Thread Jennifer Schmitz
Because a neg instruction has lower latency and higher throughput than
sdiv and mul, svdiv and svmul by -1 can be folded to svneg. For svdiv,
this is already implemented on the RTL level; for svmul, the
optimization was still missing.
This patch implements folding to svneg for both operations using the
gimple_folder. For svdiv, the transform is applied if the divisor is -1.
Svmul is folded if either of the operands is -1. A case distinction of
the predication is made to account for the fact that svneg_m has 3 arguments
(argument 0 holds the values for the inactive lanes), while svneg_x and
svneg_z have only 2 arguments.
Tests were added or adjusted to check the produced assembly and runtime
tests were added to check correctness.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz 

gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
Fold division by -1 to svneg.
(svmul_impl::fold): Fold multiplication by -1 to svneg.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/div_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/mul_s16.c: Adjust expected outcome.
* gcc.target/aarch64/sve/acle/asm/mul_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/mul_s64.c: Adjust expected outcome.
* gcc.target/aarch64/sve/acle/asm/mul_s8.c: Likewise.
* gcc.target/aarch64/sve/div_const_run.c: New test.
* gcc.target/aarch64/sve/mul_const_run.c: Likewise.
---
 .../aarch64/aarch64-sve-builtins-base.cc  | 73 ---
 .../gcc.target/aarch64/sve/acle/asm/div_s32.c | 59 +++
 .../gcc.target/aarch64/sve/acle/asm/mul_s16.c |  5 +-
 .../gcc.target/aarch64/sve/acle/asm/mul_s32.c | 48 +++-
 .../gcc.target/aarch64/sve/acle/asm/mul_s64.c |  5 +-
 .../gcc.target/aarch64/sve/acle/asm/mul_s8.c  |  7 +-
 .../gcc.target/aarch64/sve/div_const_run.c| 10 ++-
 .../gcc.target/aarch64/sve/mul_const_run.c| 10 ++-
 8 files changed, 189 insertions(+), 28 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index e7eba20f07a..2312b124c29 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -768,6 +768,27 @@ public:
 if (integer_zerop (op1) || integer_zerop (op2))
   return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
 
+/* If the divisor is all integer -1, fold to svneg.  */
+tree pg = gimple_call_arg (f.call, 0);
+if (!f.type_suffix (0).unsigned_p && integer_minus_onep (op2))
+  {
+   function_instance instance ("svneg", functions::svneg,
+   shapes::unary, MODE_none,
+   f.type_suffix_ids, GROUP_none, f.pred);
+   gcall *call = f.redirect_call (instance);
+   unsigned offset_index = 0;
+   if (f.pred == PRED_m)
+ {
+   offset_index = 1;
+   gimple_call_set_arg (call, 0, op1);
+ }
+   else
+ gimple_set_num_ops (call, 5);
+   gimple_call_set_arg (call, offset_index, pg);
+   gimple_call_set_arg (call, offset_index + 1, op1);
+   return call;
+  }
+
 /* If the divisor is a uniform power of 2, fold to a shift
instruction.  */
 tree op2_cst = uniform_integer_cst_p (op2);
@@ -2033,12 +2054,37 @@ public:
 if (integer_zerop (op1) || integer_zerop (op2))
   return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
 
+/* If one of the operands is all integer -1, fold to svneg.  */
+tree pg = gimple_call_arg (f.call, 0);
+tree negated_op = NULL;
+if (integer_minus_onep (op2))
+  negated_op = op1;
+else if (integer_minus_onep (op1))
+  negated_op = op2;
+if (!f.type_suffix (0).unsigned_p && negated_op)
+  {
+   function_instance instance ("svneg", functions::svneg,
+   shapes::unary, MODE_none,
+   f.type_suffix_ids, GROUP_none, f.pred);
+   gcall *call = f.redirect_call (instance);
+   unsigned offset_index = 0;
+   if (f.pred == PRED_m)
+ {
+   offset_index = 1;
+   gimple_call_set_arg (call, 0, op1);
+ }
+   else
+ gimple_set_num_ops (call, 5);
+   gimple_call_set_arg (call, offset_index, pg);
+   gimple_call_set_arg (call, offset_index + 1, negated_op);
+   return call;
+  }
+
 /* If one of the operands is a uniform power of 2, fold to a left shift
by immediate.  */
-tree pg = gimple_call_arg (f.call, 0);
 tree op1_cst = uniform_integer_cst_p (op1);
 tree op2_cst = uniform_integer_cst_p (op2);
-tree shift_op1, shift_op2;
+tree shift_op1, shift_op2 = NULL;
 if (op1_cst && integer_pow2p (op1_cst)
&& (f.pred != PRED_m
|| is_ptrue (pg, f.type_suffix (0).element_bytes)))
@@ -205

Re: [PATCH] Support andn_optab for x86

2024-10-15 Thread Uros Bizjak
On Tue, Oct 15, 2024 at 8:09 AM Cui, Lili  wrote:
>
> Hi all,
>
> This patch is to add andn_optab for x86.
>
> Bootstrapped and regtested on x86-64-linux-pc, OK for trunk?
>
>
> Regards,
> Lili.
>
> Add new andn pattern to match the new optab added by
> r15-1890-gf379596e0ba99d. Only enable 64bit, 128bit and
> 256bit vector ANDN, X86-64 has mask mov instruction when
> avx512 is enabled.
>
> gcc/ChangeLog:
>
> * config/i386/sse.md (andn3): New.
> * config/i386/mmx.md (andn3): New.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/i386/vect-cmp.C: New test.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/mmx.md   |  7 +++
>  gcc/config/i386/sse.md   |  7 +++
>  gcc/testsuite/g++.target/i386/vect-cmp.C | 23 +++
>  3 files changed, 37 insertions(+)
>  create mode 100644 gcc/testsuite/g++.target/i386/vect-cmp.C
>
> diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> index 9d2a82c598e..ef4ed8b501a 100644
> --- a/gcc/config/i386/mmx.md
> +++ b/gcc/config/i386/mmx.md
> @@ -4467,6 +4467,13 @@
>operands[0] = lowpart_subreg (V16QImode, operands[0], mode);
>  })
>
> +(define_expand "andn3"
> +  [(set (match_operand:MMXMODEI 0 "register_operand")
> +(and:MMXMODEI
> +  (not:MMXMODEI (match_operand:MMXMODEI 1 "register_operand"))
> +  (match_operand:MMXMODEI 2 "register_operand")))]
> +  "TARGET_SSE2")
> +
>  (define_insn "mmx_andnot3"
>[(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,x,v")
> (and:MMXMODEI
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index a45b50ad732..7be31334667 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -18438,6 +18438,13 @@
>   (match_operand:VI_AVX2 2 "vector_operand")))]
>"TARGET_SSE2")
>
> +(define_expand "andn3"
> +  [(set (match_operand:VI 0 "register_operand")
> +   (and:VI
> + (not:VI (match_operand:VI 2 "register_operand"))
> + (match_operand:VI 1 "register_operand")))]
> +  "TARGET_SSE2")
> +
>  (define_expand "_andnot3_mask"
>[(set (match_operand:VI48_AVX512VL 0 "register_operand")
> (vec_merge:VI48_AVX512VL
> diff --git a/gcc/testsuite/g++.target/i386/vect-cmp.C 
> b/gcc/testsuite/g++.target/i386/vect-cmp.C
> new file mode 100644
> index 000..c154474fa51
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/vect-cmp.C
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=x86-64-v3 -fdump-tree-optimized" } */
> +
> +#define vect8 __attribute__((vector_size(8) ))
> +#define vect16 __attribute__((vector_size(16) ))
> +#define vect32 __attribute__((vector_size(32) ))
> +
> +vect8 int bar0 (vect8 float a, vect8 float b, vect8 int c)
> +{
> +  return (a > b) ? 0 : c;
> +}
> +
> +vect16 int bar1 (vect16 float a, vect16 float b, vect16 int c)
> +{
> +  return (a > b) ? 0 : c;
> +}
> +
> +vect32 int bar2 (vect32 float a, vect32 float b, vect32 int c)
> +{
> +  return (a > b) ? 0 : c;
> +}
> +
> +/* { dg-final { scan-tree-dump-times ".BIT_ANDN " 3 "optimized" { target { ! 
> ia32 } } } } */
> --
> 2.34.1
>


[PATCH]AArch64 re-enable memory access costing after SLP change.

2024-10-15 Thread Tamar Christina
Hi All,

While chasing down a costing difference between SLP and non-SLP for memory
access costing I noticed that at some point the SLP and non-SLP costing have
diverged.  It used to be we only supported LOAD_LANES in SLP and so the non-SLP
costing was working fine.

But with the change to SLP only we now lost costing.

It looks like the vectorizer for non-SLP stores the VMAT type in
STMT_VINFO_MEMORY_ACCESS_TYPE on the stmt_info, but for SLP it stores it in
SLP_TREE_MEMORY_ACCESS_TYPE which is on the SLP node itself.

While my first attempt of a patch was to just also store the VMAT in the
stmt_info https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665295.html
Richi pointed out that this goes wrong when the same access is used Hybrid.

And so we have to do a backend specific fix.  To help out other backends this
also introduces a generic helper function suggested by Richi in that patch
(I hope that's ok.. I didn't want to split out just the helper.)

This successfully restores VMAT based costing in the new SLP only world.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vectorizer.h (vect_mem_access_type): New.
* config/aarch64/aarch64.cc (aarch64_ld234_st234_vectors): Use it.
(aarch64_detect_vector_stmt_subtype): Likewise.
(aarch64_adjust_stmt_cost): Likewise.
(aarch64_vector_costs::count_ops): Likewise.
(aarch64_vector_costs::add_stmt_cost): Make SLP node named.

---
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
102680a0efca1ce928e6945033c01cfb68a65152..055b0ff47c68dc5e7560debe5a29dcdc9df21f8c
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16278,7 +16278,7 @@ public:
 private:
   void record_potential_advsimd_unrolling (loop_vec_info);
   void analyze_loop_vinfo (loop_vec_info);
-  void count_ops (unsigned int, vect_cost_for_stmt, stmt_vec_info,
+  void count_ops (unsigned int, vect_cost_for_stmt, stmt_vec_info, slp_tree,
  aarch64_vec_op_count *);
   fractional_cost adjust_body_cost_sve (const aarch64_vec_op_count *,
fractional_cost, unsigned int,
@@ -16599,7 +16599,8 @@ aarch64_builtin_vectorization_cost (enum 
vect_cost_for_stmt type_of_cost,
vector of an LD[234] or ST[234] operation.  Return the total number of
vectors (2, 3 or 4) if so, otherwise return a value outside that range.  */
 static int
-aarch64_ld234_st234_vectors (vect_cost_for_stmt kind, stmt_vec_info stmt_info)
+aarch64_ld234_st234_vectors (vect_cost_for_stmt kind, stmt_vec_info stmt_info,
+slp_tree node)
 {
   if ((kind == vector_load
|| kind == unaligned_load
@@ -16609,7 +16610,7 @@ aarch64_ld234_st234_vectors (vect_cost_for_stmt kind, 
stmt_vec_info stmt_info)
 {
   stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
   if (stmt_info
- && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_LOAD_STORE_LANES)
+ && vect_mem_access_type (stmt_info, node) == VMAT_LOAD_STORE_LANES)
return DR_GROUP_SIZE (stmt_info);
 }
   return 0;
@@ -16847,14 +16848,15 @@ aarch64_detect_scalar_stmt_subtype (vec_info *vinfo, 
vect_cost_for_stmt kind,
 }
 
 /* STMT_COST is the cost calculated by aarch64_builtin_vectorization_cost
-   for the vectorized form of STMT_INFO, which has cost kind KIND and which
-   when vectorized would operate on vector type VECTYPE.  Try to subdivide
-   the target-independent categorization provided by KIND to get a more
+   for the vectorized form of STMT_INFO possibly using SLP node NODE, which 
has cost
+   kind KIND and which when vectorized would operate on vector type VECTYPE.  
Try to
+   subdivide the target-independent categorization provided by KIND to get a 
more
accurate cost.  WHERE specifies where the cost associated with KIND
occurs.  */
 static fractional_cost
 aarch64_detect_vector_stmt_subtype (vec_info *vinfo, vect_cost_for_stmt kind,
-   stmt_vec_info stmt_info, tree vectype,
+   stmt_vec_info stmt_info, slp_tree node,
+   tree vectype,
enum vect_cost_model_location where,
fractional_cost stmt_cost)
 {
@@ -16880,7 +16882,7 @@ aarch64_detect_vector_stmt_subtype (vec_info *vinfo, 
vect_cost_for_stmt kind,
  cost by the number of elements in the vector.  */
   if (kind == scalar_load
   && sve_costs
-  && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_GATHER_SCATTER)
+  && vect_mem_access_type (stmt_info, node) == VMAT_GATHER_SCATTER)
 {
   unsigned int nunits = vect_nunits_for_cost (vectype);
   /* Test for VNx2 modes, which have 64-bit containers.  */
@@ -16893,7 +16895,7 @@ aarch64_detect_vector_stmt_subtype (vec_info *vinfo, 
vect_cost_for_stmt kind,
  in a scatter operati

[PATCH] tree-optimization/117147 - bogus re-use of previous ldst_p

2024-10-15 Thread Richard Biener
The following shows that in vect_build_slp_tree_1 we're eventually
re-using the previous lane set ldst_p flag.  Fixed by some
refactoring.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/117147
* tree-vect-slp.cc (vect_build_slp_tree_1): Put vars and
initialization of per-lane data into the per-lane processing
loop to avoid re-using previous lane state.
---
 gcc/tree-vect-slp.cc | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 16332e0b6d7..8727246c27a 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1072,14 +1072,13 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
   stmt_vec_info first_stmt_info = stmts[0];
   code_helper first_stmt_code = ERROR_MARK;
   code_helper alt_stmt_code = ERROR_MARK;
-  code_helper rhs_code = ERROR_MARK;
   code_helper first_cond_code = ERROR_MARK;
   tree lhs;
   bool need_same_oprnds = false;
-  tree vectype = NULL_TREE, first_op1 = NULL_TREE;
+  tree first_op1 = NULL_TREE;
   stmt_vec_info first_load = NULL, prev_first_load = NULL;
-  bool first_stmt_ldst_p = false, ldst_p = false;
-  bool first_stmt_phi_p = false, phi_p = false;
+  bool first_stmt_ldst_p = false;
+  bool first_stmt_phi_p = false;
   int first_reduc_idx = -1;
   bool maybe_soft_fail = false;
   tree soft_fail_nunits_vectype = NULL_TREE;
@@ -1088,6 +1087,10 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
   stmt_vec_info stmt_info;
   FOR_EACH_VEC_ELT (stmts, i, stmt_info)
 {
+  bool ldst_p = false;
+  bool phi_p = false;
+  code_helper rhs_code = ERROR_MARK;
+
   swap[i] = 0;
   matches[i] = false;
   if (!stmt_info)
@@ -1139,7 +1142,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
  return false;
}
 
-  tree nunits_vectype;
+  tree vectype, nunits_vectype;
   if (!vect_get_vector_types_for_stmt (vinfo, stmt_info, &vectype,
   &nunits_vectype, group_size))
{
-- 
2.43.0


[PATCH] middle-end/117137 - expansion issue with vector equality compares

2024-10-15 Thread Richard Biener
When expanding a COND_EXPR with a vector equality compare as condition
expand_cond_expr_using_cmove fails to properly go the cbranch path.
I failed to massage it's twisted logic so the simple fix is to make
sure to expand a vector condition separately which also generates
the expected code for the testcase:

ptest   %xmm0, %xmm0
cmovne  %edi, %eax

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.  Will
push if it succeeds.

Richard.

PR middle-end/117137
* expr.cc (expand_cond_expr_using_cmove): Make sure to
expand vector comparisons separately.

* gcc.dg/torture/pr117137.c: New testcase.
---
 gcc/expr.cc |  6 --
 gcc/testsuite/gcc.dg/torture/pr117137.c | 13 +
 2 files changed, 17 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr117137.c

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 7a471f20e79..da486cf85fd 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -9524,7 +9524,8 @@ expand_cond_expr_using_cmove (tree treeop0 
ATTRIBUTE_UNUSED,
   EXPAND_NORMAL);
 
   if (TREE_CODE (treeop0) == SSA_NAME
-  && (srcstmt = get_def_for_expr_class (treeop0, tcc_comparison)))
+  && (srcstmt = get_def_for_expr_class (treeop0, tcc_comparison))
+  && !VECTOR_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (srcstmt
 {
   type = TREE_TYPE (gimple_assign_rhs1 (srcstmt));
   enum tree_code cmpcode = gimple_assign_rhs_code (srcstmt);
@@ -9534,7 +9535,8 @@ expand_cond_expr_using_cmove (tree treeop0 
ATTRIBUTE_UNUSED,
   unsignedp = TYPE_UNSIGNED (type);
   comparison_code = convert_tree_comp_to_rtx (cmpcode, unsignedp);
 }
-  else if (COMPARISON_CLASS_P (treeop0))
+  else if (COMPARISON_CLASS_P (treeop0)
+  && !VECTOR_TYPE_P (TREE_TYPE (TREE_OPERAND (treeop0, 0
 {
   type = TREE_TYPE (TREE_OPERAND (treeop0, 0));
   enum tree_code cmpcode = TREE_CODE (treeop0);
diff --git a/gcc/testsuite/gcc.dg/torture/pr117137.c 
b/gcc/testsuite/gcc.dg/torture/pr117137.c
new file mode 100644
index 000..b6ce78d8608
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr117137.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-msse4" { target { x86_64-*-* i?86-*-* } } } */
+
+long x[2];
+
+int
+foo (int c)
+{
+  long x0 = x[0], x1 = x[1];
+  int t = x0 != 0 | x1 != 0;
+  c *= t;
+  return c;
+}
-- 
2.43.0


[PATCH] tree-optimization/117138 - fix ICE with vector comparison in COND_EXPR

2024-10-15 Thread Richard Biener
The range folding code of COND_EXPRs missed a check whether the
comparison operand type is supported.

Bootstrap and regtest in progress on x86_64-unknown-linux-gnu.  I'll
push if that succeeds.  There might be other places missing such
a check, not sure.

Richard.

PR tree-optimization/117138
* gimple-range-fold.cc (fold_using_range::condexpr_adjust):
Check if the comparison operand type is supported.

* gcc.dg/torture/pr117138.c: New testcase.
---
 gcc/gimple-range-fold.cc|  3 ++-
 gcc/testsuite/gcc.dg/torture/pr117138.c | 13 +
 2 files changed, 15 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr117138.c

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 65d31adde54..dcd0cae0351 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -1139,7 +1139,8 @@ fold_using_range::condexpr_adjust (vrange &r1, vrange 
&r2, gimple *, tree cond,
   || TREE_CODE_CLASS (gimple_assign_rhs_code (cond_def)) != tcc_comparison)
 return false;
   tree type = TREE_TYPE (gimple_assign_rhs1 (cond_def));
-  if (!range_compatible_p (type, TREE_TYPE (gimple_assign_rhs2 (cond_def
+  if (!value_range::supports_type_p (type)
+  || !range_compatible_p (type, TREE_TYPE (gimple_assign_rhs2 (cond_def
 return false;
   range_op_handler hand (gimple_assign_rhs_code (cond_def));
   if (!hand)
diff --git a/gcc/testsuite/gcc.dg/torture/pr117138.c 
b/gcc/testsuite/gcc.dg/torture/pr117138.c
new file mode 100644
index 000..b32585d3a56
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr117138.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-msse4" { target { x86_64-*-* i?86-*-* } } } */
+
+int a, b;
+_Complex long c;
+
+void
+foo ()
+{
+  do
+b = c || a;
+  while (a);
+}
-- 
2.43.0


Re: [PATCH] tree-optimization/116907 - stale BLOCK reference from DECL_VALUE_EXPR

2024-10-15 Thread Richard Biener
On Sun, 13 Oct 2024, Richard Biener wrote:

> When we remove unused BLOCKs we fail to clean references to them
> from DECL_VALUE_EXPRs of variables in other BLOCKs which in the
> PR causes LTO streaming to walk into pointers to GGC freed blocks.
> 
> There's the question of whether such DECL_VALUE_EXPRs should keep
> variables and blocks referenced live (it doesn't seem to do that)
> and whether such DECL_VALUE_EXPRs should have survived in the
> first place.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

I've applied this now, cleaning is required unless the DECL_VALUE_EXPR
itself shouldn't be there in which case it is harmless in general.
We do stream DECL_VALUE_EXPR to LTO so some are definitely expected.

Richard.

> Thanks,
> Richard.
> 
>   PR tree-optimization/116907
>   * tree-ssa-live.cc (clear_unused_block_pointer_in_block): New
>   helper.
>   (clear_unused_block_pointer): Call it.
> ---
>  gcc/tree-ssa-live.cc | 20 
>  1 file changed, 20 insertions(+)
> 
> diff --git a/gcc/tree-ssa-live.cc b/gcc/tree-ssa-live.cc
> index 0739faa022e..484698899cf 100644
> --- a/gcc/tree-ssa-live.cc
> +++ b/gcc/tree-ssa-live.cc
> @@ -612,6 +612,22 @@ clear_unused_block_pointer_1 (tree *tp, int *, void *)
>return NULL_TREE;
>  }
>  
> +/* Clear references to unused BLOCKs from DECL_VALUE_EXPRs of variables
> +   in BLOCK.  */
> +
> +static void
> +clear_unused_block_pointer_in_block (tree block)
> +{
> +  for (tree t = BLOCK_VARS (block); t; t = DECL_CHAIN (t))
> +if (VAR_P (t) && DECL_HAS_VALUE_EXPR_P (t))
> +  {
> + tree val = DECL_VALUE_EXPR (t);
> + walk_tree (&val, clear_unused_block_pointer_1, NULL, NULL);
> +  }
> +  for (tree t = BLOCK_SUBBLOCKS (block); t; t = BLOCK_CHAIN (t))
> +clear_unused_block_pointer_in_block (t);
> +}
> +
>  /* Set all block pointer in debug or clobber stmt to NULL if the block
> is unused, so that they will not be streamed out.  */
>  
> @@ -667,6 +683,10 @@ clear_unused_block_pointer (void)
> walk_tree (gimple_op_ptr (stmt, i), clear_unused_block_pointer_1,
>NULL, NULL);
>}
> +
> +  /* Walk all variables mentioned in the functions BLOCK tree and clear
> + DECL_VALUE_EXPR from unused blocks where present.  */
> +  clear_unused_block_pointer_in_block (DECL_INITIAL (current_function_decl));
>  }
>  
>  /* Dump scope blocks starting at SCOPE to FILE.  INDENT is the
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH 3/3] AArch64: Add support for SIMD xor immediate

2024-10-15 Thread Wilco Dijkstra

Add support for SVE xor immediate when generating AdvSIMD code and SVE is 
available.

Passes bootstrap & regress, OK for commit?

gcc/ChangeLog:

* config/aarch64/aarch64.cc (enum simd_immediate_check): Add 
AARCH64_CHECK_XOR.
(aarch64_simd_valid_xor_imm): New function.
(aarch64_output_simd_imm): Add AARCH64_CHECK_XOR support.
(aarch64_output_simd_xor_imm): New function.
* config/aarch64/aarch64-protos.h (aarch64_output_simd_xor_imm): New 
prototype.
(aarch64_simd_valid_xor_imm): New prototype.
* config/aarch64/aarch64-simd.md (xor3):
Use aarch64_reg_or_xor_imm predicate and add an immediate alternative.
* config/aarch64/predicates.md (aarch64_reg_or_xor_imm): Add new 
predicate.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/simd_imm.c: New test.

---

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
3f2d40603426a590a0a14ba4792fe9b325d1e585..16ab79c02da62c1a8aa03309708dfe401d1ffb7e
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -827,6 +827,7 @@ char *aarch64_output_scalar_simd_mov_immediate (rtx, 
scalar_int_mode);
 char *aarch64_output_simd_mov_imm (rtx, unsigned);
 char *aarch64_output_simd_orr_imm (rtx, unsigned);
 char *aarch64_output_simd_and_imm (rtx, unsigned);
+char *aarch64_output_simd_xor_imm (rtx, unsigned);
 
 char *aarch64_output_sve_mov_immediate (rtx);
 char *aarch64_output_sve_ptrues (rtx);
@@ -844,6 +845,7 @@ bool aarch64_sve_ptrue_svpattern_p (rtx, struct 
simd_immediate_info *);
 bool aarch64_simd_valid_and_imm (rtx);
 bool aarch64_simd_valid_mov_imm (rtx);
 bool aarch64_simd_valid_orr_imm (rtx);
+bool aarch64_simd_valid_xor_imm (rtx);
 bool aarch64_valid_sysreg_name_p (const char *);
 const char *aarch64_retrieve_sysreg (const char *, bool, bool);
 rtx aarch64_check_zero_based_sve_index_immediate (rtx);
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
5c1de57ce6c3f2064d8be25f903a6a8d949685ef..18795a08b61da874a9e811822ed82e7eb9350bb4
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1144,12 +1144,16 @@ (define_insn "ior3"
   [(set_attr "type" "neon_logic")]
 )
 
+;; For EOR (vector, register) and SVE EOR (vector, immediate)
 (define_insn "xor3"
-  [(set (match_operand:VDQ_I 0 "register_operand" "=w")
-(xor:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w")
-(match_operand:VDQ_I 2 "register_operand" "w")))]
+  [(set (match_operand:VDQ_I 0 "register_operand")
+(xor:VDQ_I (match_operand:VDQ_I 1 "register_operand")
+   (match_operand:VDQ_I 2 "aarch64_reg_or_xor_imm")))]
   "TARGET_SIMD"
-  "eor\t%0., %1., %2."
+  {@ [ cons: =0 , 1 , 2  ]
+ [ w, w , w  ] eor\t%0., %1., %2.
+ [ w, 0 , Do ] << aarch64_output_simd_xor_imm (operands[2], 
);
+  }
   [(set_attr "type" "neon_logic")]
 )
 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
1a228147e6f945772edbd5540c44167e3a876a74..c019f21e39d9773746792d5885fa0f6805f9bb44
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -134,7 +134,8 @@ constexpr auto AARCH64_STATE_OUT = 1U << 2;
 enum simd_immediate_check {
   AARCH64_CHECK_MOV,
   AARCH64_CHECK_ORR,
-  AARCH64_CHECK_AND
+  AARCH64_CHECK_AND,
+  AARCH64_CHECK_XOR
 };
 
 /* Information about a legitimate vector immediate operand.  */
@@ -23320,6 +23321,13 @@ aarch64_simd_valid_and_imm (rtx op)
   return aarch64_simd_valid_imm (op, NULL, AARCH64_CHECK_AND);
 }
 
+/* Return true if OP is a valid SIMD xor immediate for SVE.  */
+bool
+aarch64_simd_valid_xor_imm (rtx op)
+{
+  return aarch64_simd_valid_imm (op, NULL, AARCH64_CHECK_XOR);
+}
+
 /* Check whether X is a VEC_SERIES-like constant that starts at 0 and
has a step in the range of INDEX.  Return the index expression if so,
otherwise return null.  */
@@ -25503,10 +25511,12 @@ aarch64_output_simd_imm (rtx const_vector, unsigned 
width,
 }
   else
 {
-  /* AARCH64_CHECK_ORR or AARCH64_CHECK_AND.  */
+  /* AARCH64_CHECK_ORR, AARCH64_CHECK_AND or AARCH64_CHECK_XOR.  */
   mnemonic = "orr";
   if (which == AARCH64_CHECK_AND)
mnemonic = info.insn == simd_immediate_info::MVN ? "bic" : "and";
+  else if (which == AARCH64_CHECK_XOR)
+   mnemonic = "eor";
 
   if (info.insn == simd_immediate_info::SVE_MOV)
{
@@ -25544,6 +25554,14 @@ aarch64_output_simd_and_imm (rtx const_vector, 
unsigned width)
   return aarch64_output_simd_imm (const_vector, width, AARCH64_CHECK_AND);
 }
 
+/* Returns the string with the EOR instruction for the SIMD immediate
+   CONST_VECTOR of WIDTH bits.  */
+char*
+aarch64_output_simd_xor_imm (rtx const_vector, unsigned width)
+{
+  return aarch64_output_simd_imm (const_vector, width, AARCH64_CHECK_XOR);
+}
+
 /* Returns the string with the MOV instruction for the SIMD immedia

Re: [PATCH v2] alpha: Add -mlra option

2024-10-15 Thread John Paul Adrian Glaubitz
CC'ing Maciej who has also worked on Alpha

Hi Uros,

On Tue, 2024-10-15 at 12:29 +0200, Uros Bizjak wrote:
> On Tue, Oct 15, 2024 at 11:09 AM John Paul Adrian Glaubitz
>  wrote:
> > 
> > PR target/66207
> > * config/alpha/alpha.opt (mlra): New target option.
> > * config/alpha/alpha.cc (alpha_use_lra_p): New function.
> > (TARGET_LRA_P): Use it.
> > * config/alpha/alpha.opt.urls: Regenerate.
> 
> IMO, we should simply deprecate non-BWX targets. If reload is going
> away, then there is no way for non-BWX targets to access reload
> internals they require for compilation. As mentioned in the PR,
> non-BWX targets are removed from distros anyway, so I guess there is
> no point to invest much time to modernize them,

While Debian dropped support for non-BWX targets, NetBSD still supports
them, from what I know. On Gentoo, users can actually configure the target
baseline themselves.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


Re: [PATCH] c, v2: Implement C2Y N3355 - Named Loops [PR117022]

2024-10-15 Thread Joseph Myers
On Fri, 11 Oct 2024, Jakub Jelinek wrote:

> On Fri, Oct 11, 2024 at 02:19:08PM +, Joseph Myers wrote:
> > There should definitely be a test that -std=c23 -pedantic-errors gives 
> > errors for these constructs (I'd say also test that -std=c23 
> > -pedantic-errors -Wno-c23-c2y-compat doesn't diagnose them, while -std=c2y 
> > -Wc23-c2y-compat does).  Not yet reviewed the rest of the patch.
> 
> Added those now.  I've additionally added a testcase to make sure
> /* FALLTHRU */ comments don't break it (thankfully they don't, in that
> case just a flag is set on the label), but that revealed that there was
> a -Wunused-value warning if some labels are just used to name loops and
> used in break/continue statement and nowhere else.  And another test
> to make sure [[fallthrough]]; does break it, the labels before that aren't
> in the same labeled-statement anymore.

What happens with a statement attribute on the iteration or switch 
statement?

  label: [[]] for (;;) break label;

There are no standard statement attributes (and the only non-OMP one we 
handle is musttail, which isn't applicable on such statements) so this 
example uses [[]].  The wording in the paper about "is an iteration or 
switch statement" isn't wonderfully clear about the case of an iteration 
or switch statement with attributes on it (and while we discussed the 
pragma case in WG14, the attribute case wasn't mentioned).  Whatever we 
do, there should be an associated test.

>  /* Parse a statement, other than a labeled statement.  CHAIN is a vector
> @@ -7662,6 +7706,7 @@ c_parser_statement (c_parser *parser, bo
>  
>  static void
>  c_parser_statement_after_labels (c_parser *parser, bool *if_p,
> +  tree before_labels,
>vec *chain, attr_state astate)

Should update the comment on this function to mention the new parameter.  
Likewise all other functions with this parameter added.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH 2/2] gcc: Add --enable-multilib-space option

2024-10-15 Thread Joseph Myers
On Mon, 14 Oct 2024, Keith Packard wrote:

>   * Makefile.in: Expand multilib set when --enable-multilib-space
>   * configure.ac: Support --enable-multilib-space option
>   * configure: Regenerate

This should be documented in install.texi.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH] libgccjit: Allow sending a const pointer as argument

2024-10-15 Thread Antoni Boucher

David: Ping.

Le 2024-02-17 à 11 h 55, Antoni Boucher a écrit :

David: Ping.

On Fri, 2024-01-19 at 15:59 -0500, Antoni Boucher wrote:

David: Ping.

On Thu, 2023-12-21 at 11:59 -0500, Antoni Boucher wrote:

Hi.
This patch adds the ability to send const pointer as argument to a
function.
Thanks for the review.






[PATCH] c, v3: Implement C2Y N3355 - Named Loops [PR117022]

2024-10-15 Thread Jakub Jelinek
On Tue, Oct 15, 2024 at 05:00:04PM +, Joseph Myers wrote:
> What happens with a statement attribute on the iteration or switch 
> statement?
> 
>   label: [[]] for (;;) break label;

Except for the omp::directive/omp::sequence attributes label is accepted
as loop label.
Those OpenMP attributes make the label ignored right now (it effectively
is parsed as if there is a #pragma omp in between and loops are parsed
completely differently).

> There are no standard statement attributes (and the only non-OMP one we 
> handle is musttail, which isn't applicable on such statements) so this 
> example uses [[]].  The wording in the paper about "is an iteration or 
> switch statement" isn't wonderfully clear about the case of an iteration 
> or switch statement with attributes on it (and while we discussed the 
> pragma case in WG14, the attribute case wasn't mentioned).  Whatever we 
> do, there should be an associated test.

Added test coverage.

> >  /* Parse a statement, other than a labeled statement.  CHAIN is a vector
> > @@ -7662,6 +7706,7 @@ c_parser_statement (c_parser *parser, bo
> >  
> >  static void
> >  c_parser_statement_after_labels (c_parser *parser, bool *if_p,
> > +tree before_labels,
> >  vec *chain, attr_state astate)
> 
> Should update the comment on this function to mention the new parameter.  
> Likewise all other functions with this parameter added.

Done.

Here is a new version of the patch, tested on the dg.exp=*named-loops*
tests fine, I think it doesn't need more testing given that it is just
comment changes in code plus testsuite changes.

2024-10-12  Jakub Jelinek  

PR c/117022
gcc/c-family/
* c-common.def (FOR_STMT, WHILE_STMT, DO_STMT, BREAK_STMT,
CONTINUE_STMT, SWITCH_STMT): Add an extra operand, *_NAME
and document it.
* c-common.h (bc_hash_map_t): New typedef.
(struct bc_state): Add bc_hash_map member.
(WHILE_NAME, DO_NAME, FOR_NAME, BREAK_NAME, CONTINUE_NAME,
SWITCH_STMT_NAME): Define.
* c-pretty-print.cc (c_pretty_printer::statement): Print
BREAK_STMT or CONTINUE_STMT operand if any.
* c-gimplify.cc (bc_hash_map): New static variable.
(note_named_bc, release_named_bc): New functions.
(save_bc_state): Save and clear bc_hash_map.
(restore_bc_state): Assert NULL and restore bc_hash_map.
(genericize_c_loop): Add NAME argument, call note_named_bc
and release_named_bc if non-NULL around the body walk.
(genericize_for_stmt, genericize_while_stmt, genericize_do_stmt):
Adjust callers of it.
(genericize_switch_stmt): Rename break_block variable to blab.
Call note_named_bc and release_named_bc if SWITCH_STMT_NAME is
non-NULL around the body walk.
(genericize_continue_stmt): Handle non-NULL CONTINUE_NAME.
(genericize_break_stmt): Handle non-NULL BREAK_NAME.
(c_genericize): Delete and clear bc_hash_map.
gcc/c/
* c-tree.h: Implement C2Y N3355 - Named loops.
(C_DECL_LOOP_NAME, C_DECL_SWITCH_NAME, C_DECL_LOOP_SWITCH_NAME_VALID,
C_DECL_LOOP_SWITCH_NAME_USED, IN_NAMED_STMT): Define.
(c_get_loop_names, c_release_loop_names, c_finish_bc_name): Declare.
(c_start_switch): Add NAME argument.
(c_finish_bc_stmt): Likewise.
* c-lang.h (struct language_function): Add loop_names and
loop_names_hash members.
* c-parser.cc (c_parser_external_declaration,
c_parser_declaration_or_fndef, c_parser_struct_or_union_specifier,
c_parser_parameter_declaration): Adjust c_parser_pragma caller.
(get_before_labels): New function.
(c_parser_compound_statement_nostart): Call get_before_labels when
needed, adjust c_parser_pragma and c_parser_statement_after_labels
callers.
(c_parser_statement): Call get_before_labels first and pass it to
c_parser_statement_after_labels.
(c_parser_bc_name): New function.
(c_parser_statement_after_labels): Add BEFORE_LABELS argument.  Pass
it down to c_parser_switch_statement, c_parser_while_statement,
c_parser_do_statement, c_parser_for_statement and c_parser_pragma.
Call c_parser_bc_name for RID_BREAK and RID_CONTINUE and pass it as
another argument to c_finish_bc_stmt.
(c_parser_if_body, c_parser_else_body): Call get_before_labels
early and pass it to c_parser_statement_after_labels.
(c_parser_switch_statement): Add BEFORE_LABELS argument.  Call
c_get_loop_names, if named, pass switch_name to c_start_switch,
mark it valid and set IN_NAMED_STMT bit in in_statement before
parsing body, otherwise clear IN_NAMED_STMT bit before that parsing.
Run c_release_loop_names at the end.
(c_parser_while_statement, c_parser_do_statement,
c_parser_for_statement): Add BEFORE_LABELS argument.  Call

Re: [PATCH] c, libcpp: Partially implement C2Y N3353 paper [PR117028]

2024-10-15 Thread Joseph Myers
On Tue, 15 Oct 2024, Jakub Jelinek wrote:

> --- gcc/testsuite/gcc.dg/cpp/c23-delimited-escape-seq-1.c.jj  2024-10-14 
> 17:58:54.436815339 +0200
> +++ gcc/testsuite/gcc.dg/cpp/c23-delimited-escape-seq-1.c 2024-10-14 
> 17:59:05.032666716 +0200
> @@ -0,0 +1,87 @@
> +/* P2290R3 - Delimited escape sequences */

I don't think the comments on this and other C tests should reference a 
C++ paper.

I think there should also be tests using digit separators with the 0o / 0O 
prefixes (both valid cases, and testing the error for having the digit 
separator immediately after 0o / 0O).

-- 
Joseph S. Myers
josmy...@redhat.com



[pushed] c++: add fixed testcase [PR80637]

2024-10-15 Thread Patrick Palka
Fixed by r15-4340-gcacbb4daac3e9a.

PR c++/80637

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-fn9.C: New test.
---
 gcc/testsuite/g++.dg/cpp2a/concepts-fn9.C | 15 +++
 1 file changed, 15 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-fn9.C

diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-fn9.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-fn9.C
new file mode 100644
index 000..eb2963afcc9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-fn9.C
@@ -0,0 +1,15 @@
+// PR c++/80637
+// { dg-do compile { target c++20 } }
+
+template
+concept same_as = __is_same(T, U);
+
+template
+struct A {
+  void f(int) requires same_as;
+  void f(...) requires (!same_as);
+};
+
+auto fptr = &A::f;
+using type = decltype(fptr);
+using type = void (A::*)(int);
-- 
2.47.0.72.gef8ce8f3d4



[committed] i386: Fix expand_vector_set for VEC_MERGE/VEC_DUPLICATE RTX [PR117116]

2024-10-15 Thread Uros Bizjak
Middle end can generate SYMBOL_REF RTX as a value "val" in the call
to expand_vector_set, but SYMBOL_REF RTX is not accepted in
_pinsr insn pattern, generated via
VEC_MERGE/VEC_DUPLICATE RTX path.

Force the value into a register before VEC_MERGE/VEC_DUPLICATE RTX
is generated if it doesn't satisfy nonimmediate_operand predicate.

PR target/117116

gcc/ChangeLog:

* config/i386/i386-expand.cc (expand_vector_set): Force "val"
into a register before VEC_MERGE/VEC_DUPLICATE RTX is generated
if it doesn't satisfy nonimmediate_operand predicate.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr117116.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {-m32}.

Uros.
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 2b774ff7c4e..63f5e348d64 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -18263,6 +18263,8 @@ quarter:
   else if (use_vec_merge)
 {
 do_vec_merge:
+  if (!nonimmediate_operand (val, inner_mode))
+   val = force_reg (inner_mode, val);
   tmp = gen_rtx_VEC_DUPLICATE (mode, val);
   tmp = gen_rtx_VEC_MERGE (mode, tmp, target,
   GEN_INT (HOST_WIDE_INT_1U << elt));
diff --git a/gcc/testsuite/gcc.target/i386/pr117116.c 
b/gcc/testsuite/gcc.target/i386/pr117116.c
new file mode 100644
index 000..d6e28848a4b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr117116.c
@@ -0,0 +1,18 @@
+/* PR target/117116 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx2" } */
+
+typedef void (*StmFct)();
+typedef struct {
+  StmFct fct_getc;
+  StmFct fct_putc;
+  StmFct fct_flush;
+  StmFct fct_close;
+} StmInf;
+
+StmInf TTY_Getc_pstm;
+
+void TTY_Getc() {
+  TTY_Getc_pstm.fct_getc = TTY_Getc;
+  TTY_Getc_pstm.fct_putc = TTY_Getc_pstm.fct_flush = TTY_Getc_pstm.fct_close = 
(StmFct)1;
+}


[PATCH v4] RISC-V: add option -m(no-)autovec-segment

2024-10-15 Thread Patrick O'Neill
From: Greg McGary 

Add option -m(no-)autovec-segment to enable/disable autovectorizer
from emitting vector segment load/store instructions. This is useful for
performance experiments.

gcc/ChangeLog:
* config/riscv/autovec.md (vec_mask_len_load_lanes, 
vec_mask_len_store_lanes):
  Predicate with TARGET_VECTOR_AUTOVEC_SEGMENT
* gcc/config/riscv/riscv-opts.h (TARGET_VECTOR_AUTOVEC_SEGMENT): New 
macro.
* gcc/config/riscv/riscv.opt (-m(no-)autovec-segment): New option.
* testsuite/gcc.target/riscv/rvv/autovec/struct/*_noseg*.c,
testsuite/gcc.target/riscv/rvv/autovec/no-segment.c: New tests.
---
Relying on CI for testing. Please wait for that testing to complete before
committing.

v4 changelog:
Remove ICE expectation since middle-end ICE has been resolved.
---
 gcc/config/riscv/autovec.md   |  4 +-
 gcc/config/riscv/riscv-opts.h |  5 ++
 gcc/config/riscv/riscv.opt|  4 ++
 .../gcc.target/riscv/rvv/autovec/no-segment.c | 61 +++
 .../autovec/struct/mask_struct_load_noseg-1.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-2.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-3.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-4.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-5.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-6.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-7.c |  6 ++
 .../struct/mask_struct_load_noseg_run-1.c |  4 ++
 .../struct/mask_struct_load_noseg_run-2.c |  4 ++
 .../struct/mask_struct_load_noseg_run-3.c |  4 ++
 .../struct/mask_struct_load_noseg_run-4.c |  4 ++
 .../struct/mask_struct_load_noseg_run-5.c |  4 ++
 .../struct/mask_struct_load_noseg_run-6.c |  4 ++
 .../struct/mask_struct_load_noseg_run-7.c |  4 ++
 .../struct/mask_struct_store_noseg-1.c|  6 ++
 .../struct/mask_struct_store_noseg-2.c|  6 ++
 .../struct/mask_struct_store_noseg-3.c|  6 ++
 .../struct/mask_struct_store_noseg-4.c|  6 ++
 .../struct/mask_struct_store_noseg-5.c|  6 ++
 .../struct/mask_struct_store_noseg-6.c|  6 ++
 .../struct/mask_struct_store_noseg-7.c|  6 ++
 .../struct/mask_struct_store_noseg_run-1.c|  4 ++
 .../struct/mask_struct_store_noseg_run-2.c|  4 ++
 .../struct/mask_struct_store_noseg_run-3.c|  4 ++
 .../struct/mask_struct_store_noseg_run-4.c|  4 ++
 .../struct/mask_struct_store_noseg_run-5.c|  4 ++
 .../struct/mask_struct_store_noseg_run-6.c|  4 ++
 .../struct/mask_struct_store_noseg_run-7.c|  4 ++
 .../rvv/autovec/struct/struct_vect_noseg-1.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-10.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-11.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-12.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-13.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-14.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-15.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-16.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-17.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-18.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-2.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-3.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-4.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-5.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-6.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-7.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-8.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-9.c  |  7 +++
 .../autovec/struct/struct_vect_noseg_run-1.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-10.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-11.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-12.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-13.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-14.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-15.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-16.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-17.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-18.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-2.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-3.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-4.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-5.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-6.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-7.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-8.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-9.c  |  4 ++
 68 files changed, 409 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_

[PATCH 2/2] Add a new permute optimization step in SLP

2024-10-15 Thread Christoph Müllner
This commit adds a new permute optimization step after running SLP 
vectorization.
Although there are existing places where individual or nested permutes
can be optimized, there are cases where independent permutes can be optimized,
which cannot be expressed in the current pattern matching framework.
The optimization step is run at the end so that permutes from completely 
different
SLP builds can be optimized.

The initial optimizations implemented can detect some cases where different
"select permutes" (permutes that only use some of the incoming vector lanes)
can be co-located in a single permute. This can optimize some cases where
two_operator SLP nodes have duplicate elements.

Bootstrapped and reg-tested on AArch64 (C, C++, Fortran).

Manolis Tsamis was the patch's initial author before I took it over.

gcc/ChangeLog:

* tree-vect-slp.cc (get_tree_def): Return the definition of a name.
(recognise_perm_binop_perm_pattern): Helper function.
(vect_slp_optimize_permutes): New permute optimization step.
(vect_slp_function): Run the new permute optimization step.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-perm-14.c: New test.
* gcc.target/aarch64/sve/slp-perm-14.c: New test.

Signed-off-by: Christoph Müllner 
---
 gcc/testsuite/gcc.dg/vect/slp-perm-14.c   |  42 +++
 .../gcc.target/aarch64/sve/slp-perm-14.c  |   3 +
 gcc/tree-vect-slp.cc  | 248 ++
 3 files changed, 293 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/slp-perm-14.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/slp-perm-14.c

diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-14.c 
b/gcc/testsuite/gcc.dg/vect/slp-perm-14.c
new file mode 100644
index 000..f56e3982a62
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-14.c
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3 -fdump-tree-slp1-details" } */
+
+#include 
+
+#define HADAMARD4(d0, d1, d2, d3, s0, s1, s2, s3) {\
+int t0 = s0 + s1;\
+int t1 = s0 - s1;\
+int t2 = s2 + s3;\
+int t3 = s2 - s3;\
+d0 = t0 + t2;\
+d1 = t1 + t3;\
+d2 = t0 - t2;\
+d3 = t1 - t3;\
+}
+
+int
+x264_pixel_satd_8x4_simplified (uint8_t *pix1, int i_pix1, uint8_t *pix2, int 
i_pix2)
+{
+  uint32_t tmp[4][4];
+  uint32_t a0, a1, a2, a3;
+  int sum = 0;
+
+  for (int i = 0; i < 4; i++, pix1 += i_pix1, pix2 += i_pix2)
+{
+  a0 = (pix1[0] - pix2[0]) + ((pix1[4] - pix2[4]) << 16);
+  a1 = (pix1[1] - pix2[1]) + ((pix1[5] - pix2[5]) << 16);
+  a2 = (pix1[2] - pix2[2]) + ((pix1[6] - pix2[6]) << 16);
+  a3 = (pix1[3] - pix2[3]) + ((pix1[7] - pix2[7]) << 16);
+  HADAMARD4(tmp[i][0], tmp[i][1], tmp[i][2], tmp[i][3], a0, a1, a2, a3);
+}
+
+  for (int i = 0; i < 4; i++)
+{
+  HADAMARD4(a0, a1, a2, a3, tmp[0][i], tmp[1][i], tmp[2][i], tmp[3][i]);
+  sum += a0 + a1 + a2 + a3;
+}
+
+  return (((uint16_t)sum) + ((uint32_t)sum>>16)) >> 1;
+}
+
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 3, 6, 7 }" "slp1" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/slp-perm-14.c 
b/gcc/testsuite/gcc.target/aarch64/sve/slp-perm-14.c
new file mode 100644
index 000..4e0d5175be8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/slp-perm-14.c
@@ -0,0 +1,3 @@
+#include "../../../gcc.dg/vect/slp-perm-14.c"
+
+/* { dg-final { scan-assembler-not {\ttbl\t} } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 8794c94ef90..4bf5ccb9cdf 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -9478,6 +9478,252 @@ vect_slp_if_converted_bb (basic_block bb, loop_p 
orig_loop)
   return vect_slp_bbs (bbs, orig_loop);
 }
 
+/* If NAME is an SSA_NAME defined by an assignment, return that assignment.
+   If SINGLE_USE_ONLY is true and NAME has multiple uses, return NULL.  */
+
+static gassign *
+get_tree_def (tree name, bool single_use_only)
+{
+  if (TREE_CODE (name) != SSA_NAME)
+return NULL;
+
+  gimple *def_stmt = SSA_NAME_DEF_STMT (name);
+
+  if (single_use_only && !has_single_use (name))
+return NULL;
+
+  if (!is_gimple_assign (def_stmt))
+return NULL;
+
+  return dyn_cast  (def_stmt);
+}
+
+/* Helper function for vect_slp_optimize_permutes.  Return true if STMT is an
+   expression of the form:
+
+ src1_perm = VEC_PERM_EXPR 
+ src2_perm = VEC_PERM_EXPR 
+ bop1 = src1_perm BINOP1 src2_perm
+ bop2 = src1_perm BINOP2 src2_perm
+ STMT = VEC_PERM_EXPR 
+
+   and src1_perm, src2_perm, bop1, bop2 are not used outside of STMT.
+   Return the first two permute statements and the binops through the
+   corresponding pointer arguments.  */
+
+static bool
+recognise_perm_binop_perm_pattern (gassign *stmt,
+  gassign **bop1_out, gassign **bop2_out,
+  gassign **perm1_out, gassign **perm2_out)
+{
+  if (gimple_assign_rhs_code (stmt) != VEC_PERM_EXPR)
+return false;
+
+  gassign *bop1, *bop2;
+  if

[PATCH 1/2] Reduce lane utilization in VEC_PERM_EXPRs for two_operator nodes

2024-10-15 Thread Christoph Müllner
When two_operator SLP nodes are built, the VEC_PERM_EXPR that merges the result
selects a lane only based on the operator found. If the input nodes have
duplicate elements, there may be more than one way to choose. This commit
changes the policy to reuse an existing lane with the result that we can
possibly free up lanes entirely.

For example, given two vectors with duplicates:
  A = {a1, a1, a2, a2}
  B = {b1, b1, b3, b2}

A two_operator node with operators +, -, +, - is currently built as:
  RES = VEC_PERM_EXPR(0, 5, 2, 7)
With this patch, the permutation becomes:
  RES = VEC_PERM_EXPR(0, 4, 2, 6)
Lanes 0 and 2 are reused and lanes 1 and 3 are not utilized anymore.

The direct effect of this change can be seen in the AArch64 test case,
where the simpler permutation allows to lower to a TRN1 instead of an
expensive TBL.

Bootstrapped and reg-tested on AArch64 (C, C++, Fortran).

Manolis Tsamis was the patch's initial author before I took it over.

gcc/ChangeLog:

* tree-vect-slp.cc: Reduce lane utilization in VEC_PERM_EXPRs for 
two_operators

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-perm-13.c: New test.
* gcc.target/aarch64/sve/slp-perm-13.c: New test.

Signed-off-by: Christoph Müllner 
---
 gcc/testsuite/gcc.dg/vect/slp-perm-13.c   | 29 +++
 .../gcc.target/aarch64/sve/slp-perm-13.c  |  4 +++
 gcc/tree-vect-slp.cc  | 21 +-
 3 files changed, 53 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/slp-perm-13.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/slp-perm-13.c

diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-13.c 
b/gcc/testsuite/gcc.dg/vect/slp-perm-13.c
new file mode 100644
index 000..08639e72fb0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-13.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3 -fdump-tree-slp2-details" } */
+
+#define LOAD_VEC(e0, e1, e2, e3, p) \
+int e0 = p[0]; \
+int e1 = p[1]; \
+int e2 = p[2]; \
+int e3 = p[3];
+
+#define STORE_VEC(p, e0, e1, e2, e3) \
+p[0] = e0; \
+p[1] = e1; \
+p[2] = e2; \
+p[3] = e3;
+
+void
+foo (int *p)
+{
+  LOAD_VEC(s0, s1, s2, s3, p);
+
+  int t0 = s0 + s1;
+  int t1 = s0 - s1;
+  int t2 = s2 + s3;
+  int t3 = s2 - s3;
+
+  STORE_VEC(p, t0, t1, t2, t3);
+}
+
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 0, 4, 2, 6 }" "slp2" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/slp-perm-13.c 
b/gcc/testsuite/gcc.target/aarch64/sve/slp-perm-13.c
new file mode 100644
index 000..f5839f273e5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/slp-perm-13.c
@@ -0,0 +1,4 @@
+#include "../../../gcc.dg/vect/slp-perm-13.c"
+
+/* { dg-final { scan-assembler-not {\ttbl\t} } } */
+
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 16332e0b6d7..8794c94ef90 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -2921,7 +2921,26 @@ fail:
  gassign *ostmt = as_a  (ostmt_info->stmt);
  if (gimple_assign_rhs_code (ostmt) != code0)
{
- SLP_TREE_LANE_PERMUTATION (node).safe_push (std::make_pair (1, 
i));
+ /* If the current element can be found in another lane that has
+been used previously then use that one instead.  This can
+happen when the ONE and TWO contain duplicate elements and
+reduces the number of 'active' lanes.  */
+ int idx = i;
+ for (int alt_idx = (int) i - 1; alt_idx >= 0; alt_idx--)
+   {
+ gassign *alt_stmt = as_a  (stmts[alt_idx]->stmt);
+ if (gimple_assign_rhs_code (alt_stmt) == code0
+ && gimple_assign_rhs1 (ostmt)
+   == gimple_assign_rhs1 (alt_stmt)
+ && gimple_assign_rhs2 (ostmt)
+   == gimple_assign_rhs2 (alt_stmt))
+   {
+ idx = alt_idx;
+ break;
+   }
+   }
+ SLP_TREE_LANE_PERMUTATION (node)
+   .safe_push (std::make_pair (1, idx));
  ocode = gimple_assign_rhs_code (ostmt);
  j = i;
}
-- 
2.46.0



Re: [PATCH 4/4] c++: enable modules by default in c++20

2024-10-15 Thread Jakub Jelinek
On Fri, Oct 11, 2024 at 10:41:36PM -0400, Jason Merrill wrote:
> The intent is that C++20 module header units obsolete PCH; they serve the
> same function and are more flexible (you can import multiple header units).

Though, simple use of -std=c++20 or -std=c++23 doesn't imply one is using
modules.
Unfortunately there is no special driver option for PCH header writing,
but until now one had to use explicit -fmodules to disable the PCH header
writing and produce header unit.

Can't we keep it that way, i.e. only disable PCH with explicit -fmodules,
for implicit ones simply write both PCH and module header unit?

Then users can decide when using it what they actually use, either
they import a module header unit, or they include header and get PCH.

Jakub



  1   2   >