date:20241004

Re: [PATCH 2/3] aarch64: libgcc: add prototypes in cpuinfo

2024-10-04 Thread Christophe Lyon

On Fri, 4 Oct 2024 at 10:00, Kyrylo Tkachov  wrote:
>
>
>
> > On 3 Oct 2024, at 21:44, Christophe Lyon  wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > Add prototypes for __init_cpu_features_resolver and
> > __init_cpu_features to avoid warnings due to -Wmissing-prototypes.
> >
> >libgcc/
> >* config/aarch64/cpuinfo.c (__init_cpu_features_resolver): Add
> >prototype.
> >(__init_cpu_features): Likewise.
> > ---
> > libgcc/config/aarch64/cpuinfo.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/libgcc/config/aarch64/cpuinfo.c 
> > b/libgcc/config/aarch64/cpuinfo.c
> > index 4b94fca8695..c62a7453e8e 100644
> > --- a/libgcc/config/aarch64/cpuinfo.c
> > +++ b/libgcc/config/aarch64/cpuinfo.c
> > @@ -418,6 +418,7 @@ __init_cpu_features_constructor(unsigned long hwcap,
> >   setCPUFeature(FEAT_INIT);
> > }
> >
> > +void __init_cpu_features_resolver(unsigned long, const __ifunc_arg_t *);
> > void
> > __init_cpu_features_resolver(unsigned long hwcap, const __ifunc_arg_t *arg) 
> > {
> >   if (__aarch64_cpu_features.features)
> > @@ -425,6 +426,7 @@ __init_cpu_features_resolver(unsigned long hwcap, const 
> > __ifunc_arg_t *arg) {
> >   __init_cpu_features_constructor(hwcap, arg);
> > }
> >
> > +void __init_cpu_features(void);
> > void __attribute__ ((constructor))
> > __init_cpu_features(void) {
> >   unsigned long hwcap;
>
> I thought the intent of the missing-prototypes warning is to warn about 
> missing prototypes in a header file primarily.

Indeed, that's my understanding too

> Should these prototypes go into gcc/common/config/aarch64/cpuinfo.h instead?
In that case, compilation of gcc/config/aarch64/aarch64.c fails because:
gcc/common/config/aarch64/cpuinfo.h:96:56: error: ‘__ifunc_arg_t’ does
not name a type
and it does not seem obvious to expose this type in aarch64.c

IIUC, these functions never have their prototypes exposed/used, and
I'm not even sure how __init_cpu_features is called: in
dispatch_function_versions(), I only see a reference to
__init_cpu_features_resolver?

(But I'm not at all familiar with this code)

Thanks,

Christophe


> Thanks,
> Kyrill
>
> > --
> > 2.34.1
> >
>

[Ada] Fix PR ada/116430

2024-10-04 Thread Eric Botcazou

This is a regression present on the 14 branch only: the expander gets confused 
when trying to insert the finalizer of a procedure that contains a package as 
a subunit.  The offending code no longer exists on the mainline so this adds 
the minimal fix to address the issue.

Tested on x86-64/Linux, applied on the 14 branch only.


2024-10-04  Eric Botcazou  

PR ada/116430
* exp_ch7.adb (Build_Finalizer.Create_Finalizer): For the insertion
point of the finalizer, deal with package bodies that are subunits.

-- 
Eric Botcazoudiff --git a/gcc/ada/exp_ch7.adb b/gcc/ada/exp_ch7.adb
index e594a534244..123abb63289 100644
--- a/gcc/ada/exp_ch7.adb
+++ b/gcc/ada/exp_ch7.adb
@@ -2051,6 +2051,12 @@ package body Exp_Ch7 is
 and then List_Containing (Finalizer_Insert_Nod) = Stmts)
then
   Finalizer_Insert_Nod := Last_Top_Level_Ctrl_Construct;
+  if Nkind (Finalizer_Insert_Nod) = N_Package_Body
+and then Nkind (Parent (Finalizer_Insert_Nod)) = N_Subunit
+  then
+ Finalizer_Insert_Nod :=
+Corresponding_Stub (Parent (Finalizer_Insert_Nod));
+  end if;
end if;
 
Insert_After (Finalizer_Insert_Nod, Fin_Body);

[PATCH] expr, v2: Don't clear whole unions [PR116416]

2024-10-04 Thread Jakub Jelinek

On Thu, Oct 03, 2024 at 12:14:35PM -0400, Jason Merrill wrote:
> Agreed, the padding bits have indeterminate values (or erroneous in C++26),
> so it's correct for infoleak-1.c to complain about 4b.

I've been afraid what the kernel people would say about this change (because
reading Linus' mails shows he doesn't care about what the standards say,
but what he expects to see, anything else is "broken").

> > Though, looking at godbolt, clang and icc 19 and older gcc all do zero
> > initialize the whole union before storing the single member in there (if
> > non-zero, otherwise just clear).
> > 
> > So whether we want to do this or do it by default is another question.
> 
> We will want to initialize the padding (for all types) to something for
> C++26, but that's a separate issue...

But ideally in a way where uninit warnings know the bits aren't initialized
even if they are.

> > Anyway, bootstrapped/regtested on x86_64-linux and i686-linux successfully.
> > 
> > 2024-09-28  Jakub Jelinek  
> > 
> > PR c++/116416
> > * expr.cc (categorize_ctor_elements_1): Fix up union handling of
> > *p_complete.  Clear it only if num_fields is 0 and the union has
> > at least one FIELD_DECL, set to -1 if either union has no fields
> > and non-zero size, or num_fields is 1 and complete_ctor_at_level_p
> > returned false.
> 
> Hmm, complete_ctor_at_level_p also seems to need a change for this
> understanding of union semantics: "every meaningful byte" depends on the
> active member, so it seems like it should return true for a union iff
> num_elts == 1.

I thought complete_ctor_at_level_p has a single caller, but apparently
that isn't the case, cp/typeck2.cc uses it too.

Here is an updated version of the patch, which
a) moves some of the stuff into complete_ctor_at_level_p (but not
   all the *p_complete = 0; case, for that it would need to change
   so that it passes around the ctor rather than just its type)
b) introduces a new option, so that users can either get the new
   behavior (only what is guaranteed by the standards, the default),
   or previous behavior (union padding zero initialization, no such
   guarantees in structures) or also a guarantee in structures
c) introduces a new CONSTRUCTOR flag which says that the padding bits
   (if any) should be zero initialized (and sets it for now in the C++
   FE for C23 {} initializers).

Am not sure the CONSTRUCTOR_ZERO_PADDING_BITS flag is really needed
for C23, if there is just empty initializer, I think we already mark
it as incomplete if there are any missing initializers.  Maybe with
some designated initializer games, say
void foo () {
  struct S { char a; long long b; };
  struct T { struct S c; } t = { .c = {}, .c.a = 1, .c.b = 2 };
...
}
Is this supposed to initialize padding bits in C23 and then the .c.a = 1
and .c.b = 2 stores preserve those padding bits, so is that supposed
to be different from struct T t2 = { .c = { 1, 2 } };
?  What about just struct T t3 = { .c.a = 1, .c.b = 2 }; ?

And I haven't touched the C++ FE for the flag, because I'm afraid I'm lost
on where exactly is zero-initialization done (vs. other types of
initialization) and where is e.g. zero-initialization of a temporary then
(member-wise) copied.
Say
struct S { char a; long long b; };
struct T { constexpr T (int a, int b) : c () { c.a = a; c.b = b; } S c; };
void bar (T *);

void
foo ()
{
  T t (1, 2);
  bar (&t);
}
Is the c () value-initialization of t.c followed by c.a and c.b updates
which preserve the zero initialized padding bits?  Or is there some
copy construction involved which does member-wise copying and makes the
padding bits undefined?
Looking at (older) clang++ with -O2, it initializes also the padding bits
when c () is used and doesn't with c {}.
For GCC, note that there is that optimization from Alex to zero padding bits
for optimization purposes for small aggregates, so either one needs to look
at -O0 -fdump-tree-gimple dumps, or use larger structures which aren't
optimized that way.

Only lightly tested so far, this is mostly for further discussions.
And also a question what exactly does cp/typeck2.cc want from
complete_ctor_at_level_p, e.g. if it wants false for all the cases where
categorize_ctor_elements_1 does *p_complete = 0; (in that case it would need
to know whether CONSTRUCTOR_ZERO_PADDING_BITS flag was set).

2024-10-04  Jakub Jelinek  

PR c++/116416
gcc/
* flag-types.h (enum zero_init_padding_bits_kind): New type.
* tree.h (CONSTRUCTOR_ZERO_PADDING_BITS): Define.
* common.opt (fzero-init-padding-bits=): New option.
* expr.cc (categorize_ctor_elements_1): Handle
CONSTRUCTOR_ZERO_PADDING_BITS or
flag_zero_init_padding_bits == ZERO_INIT_PADDING_BITS_ALL.  Fix up
*p_complete = -1; setting for unions.
(complete_ctor_at_level_p): Handle unions differently for
flag_zero_init_padding_bits == ZERO_INIT_PADDING_BITS_STANDARD.
* gimple-fold.cc (type_has_padding

Re: [RFC PATCH] ARM: thumb1: fix bad code emitted when HI_REGS involved

2024-10-04 Thread Siarhei Volkau

пт, 4 окт. 2024 г. в 19:07, Christophe Lyon :

>
> On Fri, 4 Oct 2024 at 16:59, Siarhei Volkau  wrote:
> >
> > Hello,
> >
> > пт, 4 окт. 2024 г. в 16:48, Christophe Lyon :
> > >
> > > Hi!
> > >
> > >
> > > On Mon, 8 Jul 2024 at 10:57, Siarhei Volkau  wrote:
> > > >
> > > > ping
> > > >
> > > > чт, 20 июн. 2024 г. в 12:09, Siarhei Volkau :
> > > > >
> > > > > This patch deals with consequences but not the root cause though.
> > > > >
> > > > > There are 5 cases which are subjects to rewrite:
> > > > > case #1:
> > > > >   mov ip, r1
> > > > >   add r2, ip
> > > > >   # ip is dead here
> > > > > can be rewritten as:
> > > > >   adds r2, r1
> > >
> > > Why replace 'add' with 'adds' ?
> > >
> > > Thanks,
> > >
> > > Christophe
> > >
> >
> > Good catch, actually. Silly answer is:
> > because there's no alternative without {S} for Lo registers in thumb1.
> >
> > Correct me if I'm wrong, I don't think that we have to do something
> > special with CC reg there because conditional execution instructions
> > (thumb1_cbz, cbranchsi4_insn) take care of that.
> > See thumb1_final_prescan_insn.
> >
>
> Not familiar with how this is handled, but my question is more like:
> if the original code is
> case #1:
>  adds r3,r0  ;; or any instruction which sets CC
>   mov ip, r1
>   add r2, ip
>   # ip is dead here
> cbz ...
>
> If you rewrite as
>   adds r3,r0
>   adds r2, r1
>   cbz
> then you change CC and it does not get the value expected by cbz.
>
> Am I missing something?
>
> Thanks,
>
> Christophe
>

Your point is correct in general but look at the thumb1.md.

You will not find a separate "compare" pattern (except one)
and "if_then_else", which relies on the previous compare result.
Because they are combined in one insn pattern, they will be emitted
together as a pair of cmp/cbranch, later than peephole2.

So there's no chance to put an instruction in between by this patch.
But even if it happens somehow, (as I said there's one "compare" insn)
there's a mechanism which tracks condition codes for branch insn.
And if the CC is not matched for the branch insn then extra cmp will be emitted.

>
> > Thanks
> >
> > Siarhei
> >
> > > > >
> > > > > case #2:
> > > > >   add ip, r1
> > > > >   mov r1, ip
> > > > >   # ip is dead here
> > > > > can be rewritten as:
> > > > >   add r1, ip
> > > > >
> > > > > case #3:
> > > > >   mov ip, r1
> > > > >   add r2, ip
> > > > >   add r3, ip
> > > > >   # ip is dead here
> > > > > can be rewritten as:
> > > > >   adds r2, r1
> > > > >   adds r3, r1
> > > > >
> > > > > case #4:
> > > > >   mov ip, r1
> > > > >   add ip, r2
> > > > >   mov r1, ip
> > > > > can be rewritten as:
> > > > >   adds r1, r2
> > > > >   mov  ip, r1 <- might be eliminated too, if ip is dead
> > > > >
> > > > > case #5 (arbitrary):
> > > > >   mov  r1, ip
> > > > >   subs r2, r1, r2
> > > > >   mov  ip, r2
> > > > >   # r1 is dead here
> > > > > can be rewritten as:
> > > > >   rsbs r1, r2, #0
> > > > >   add  ip, r1
> > > > >   movs r2, ip <- might be eliminated, if r2 is dead
> > > > >
> > > > > Speed profit wasn't checked but size changes are the following:
> > > > >libgcc:  -132 bytes / -0.25%
> > > > >  libc: -1262 bytes / -0.55%
> > > > >  libm:  -384 bytes / -0.42%
> > > > > libstdc++: -2258 bytes / -0.30%
> > > > >
> > > > > No tests provided because its hard to force GCC to emit HI_REGS
> > > > > in a small and straightforward function.
> > > > >
> > > > > Signed-off-by: Siarhei Volkau 
> > > > > ---
> > > > >  gcc/config/arm/thumb1.md | 93 
> > > > > +++-
> > > > >  1 file changed, 92 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> > > > > index d7074b43f60..9da4af9eccd 100644
> > > > > --- a/gcc/config/arm/thumb1.md
> > > > > +++ b/gcc/config/arm/thumb1.md
> > > > > @@ -2055,4 +2055,95 @@ (define_insn "thumb1_stack_protect_test_insn"
> > > > > (set_attr "conds" "clob")
> > > > > (set_attr "type" "multiple")]
> > > > >  )
> > > > > -
> > > > > +
> > > > > +;; bad code emitted when HI_REGS involved in addition
> > > > > +;; subtract also might happen rarely
> > > > > +
> > > > > +;; case #1:
> > > > > +;; mov ip, r1
> > > > > +;; add r2, ip # ip is dead after that
> > > > > +(define_peephole2
> > > > > +  [(set (match_operand:SI 0 "register_operand" "")
> > > > > +   (match_operand:SI 1 "register_operand" ""))
> > > > > +   (set (match_operand:SI 2 "register_operand" "")
> > > > > +   (plus:SI (match_dup 2) (match_dup 0)))]
> > > > > +  "TARGET_THUMB1
> > > > > +&& peep2_reg_dead_p (2, operands[0])
> > > > > +&& REGNO_REG_CLASS (REGNO (operands[0])) == HI_REGS"
> > > > > +  [(set (match_dup 2)
> > > > > +   (plus:SI (match_dup 2) (match_dup 1)))]
> > > > > +  "")
> > > > > +
> > > > > +;; case #2:
> > > > > +;; add ip, r1
> > > > > +;; mov r1, ip # ip is dead after that
> > > > > +(define_peephole2
> > > > > +  [(set (match_operand:SI 0 "register_operand" "")
> > > >

[PATCH v3 3/5] openmp: Add support for iterators in 'target update' clauses (C/C++)

2024-10-04 Thread Kwok Cheung Yeung


This patch extends the previous patch to cover to/from clauses in
'target update'.From 1c8bf84ec99fe2fd371e345f012eb0d84a923153 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Fri, 4 Oct 2024 15:16:21 +0100
Subject: [PATCH 3/5] openmp: Add support for iterators in 'target update'
 clauses (C/C++)

This adds support for iterators in 'to' and 'from' clauses in the
'target update' OpenMP directive.

2024-10-04  Kwok Cheung Yeung  

gcc/c/
* c-parser.cc (c_parser_omp_clause_from_to): Parse 'iterator' modifier.
* c-typeck.cc (c_finish_omp_clauses): Finish iterators for to/from
clauses.

gcc/cp/
* parser.cc (cp_parser_omp_clause_from_to): Parse 'iterator' modifier.
* semantics.cc (finish_omp_clauses): Finish iterators for to/from
clauses.

gcc/
* gimplify.cc (gimplify_scan_omp_clauses): Call
check_omp_map_iterators on clauses with iterators.  Skip
gimplification of clause decl and size for clauses with iterators.
* omp-low.cc (lower_omp_target): Call lower_omp_map_iterators on
to/from clauses.
* tree-pretty-print.cc (dump_omp_clause): Call dump_omp_iterators
for to/from clauses with iterators.
* tree.cc (omp_clause_num_ops): Add extra operand for OMP_CLAUSE_FROM
and OMP_CLAUSE_TO.
* tree.h (OMP_CLAUSE_HAS_ITERATORS): Add check for OMP_CLAUSE_TO and
OMP_CLAUSE_FROM.
(OMP_CLAUSE_ITERATORS): Likewise.

gcc/testsuite/
* c-c++-common/gomp/target-update-iterators-1.c: New.
* c-c++-common/gomp/target-update-iterators-2.c: New.
* c-c++-common/gomp/target-update-iterators-3.c: New.

libgomp/
* target.c (gomp_update): Call gomp_merge_iterator_maps.  Free
allocated variables.
* testsuite/libgomp.c-c++-common/target-update-iterators-1.c: New.
* testsuite/libgomp.c-c++-common/target-update-iterators-2.c: New.
* testsuite/libgomp.c-c++-common/target-update-iterators-3.c: New.
---
 gcc/c/c-parser.cc | 105 +++--
 gcc/c/c-typeck.cc |   5 +-
 gcc/cp/parser.cc  | 111 --
 gcc/cp/semantics.cc   |   5 +-
 gcc/gimplify.cc   |  18 ++-
 gcc/omp-low.cc|   3 +-
 .../gomp/target-update-iterators-1.c  |  20 
 .../gomp/target-update-iterators-2.c  |  17 +++
 .../gomp/target-update-iterators-3.c  |  17 +++
 gcc/tree-pretty-print.cc  |  10 ++
 gcc/tree.cc   |   4 +-
 gcc/tree.h|   8 +-
 libgomp/target.c  |  14 +++
 .../target-update-iterators-1.c   |  65 ++
 .../target-update-iterators-2.c   |  58 +
 .../target-update-iterators-3.c   |  67 +++
 16 files changed, 496 insertions(+), 31 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/target-update-iterators-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/target-update-iterators-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/target-update-iterators-3.c
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/target-update-iterators-1.c
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/target-update-iterators-2.c
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/target-update-iterators-3.c

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 184fc076388..c2a5985c89b 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -19304,8 +19304,11 @@ c_parser_omp_clause_device_type (c_parser *parser, 
tree list)
to ( variable-list )
 
OpenMP 5.1:
-   from ( [present :] variable-list )
-   to ( [present :] variable-list ) */
+   from ( [motion-modifier[,] [motion-modifier[,]...]:] variable-list )
+   to ( [motion-modifier[,] [motion-modifier[,]...]:] variable-list )
+
+   motion-modifier:
+ present | iterator (iterators-definition)  */
 
 static tree
 c_parser_omp_clause_from_to (c_parser *parser, enum omp_clause_code kind,
@@ -19316,15 +19319,88 @@ c_parser_omp_clause_from_to (c_parser *parser, enum 
omp_clause_code kind,
   if (!parens.require_open (parser))
 return list;
 
+  int pos = 1, colon_pos = 0;
+  int iterator_length = 0;
+  while (c_parser_peek_nth_token_raw (parser, pos)->type == CPP_NAME)
+{
+  if (c_parser_peek_nth_token_raw (parser, pos + 1)->type
+ == CPP_OPEN_PAREN)
+   {
+ unsigned int n = pos + 2;
+ if (c_parser_check_balanced_raw_token_sequence (parser, &n)
+&& (c_parser_peek_nth_token_raw (parser, n)->type
+== CPP_CLOSE_PAREN))
+   {
+ iterator_length = n - pos + 1;
+ pos = n;
+   }
+   }
+  if (c_parser_peek_nth_token_raw (parser, pos + 1)->type == CP

arm: Make arm_noce_conversion_profitable_p call default hook [PR 116444]

2024-10-04 Thread Andre Vieira (lists)


Hi,

The patch for 'arm: Fix missed CE optimization for armv8.1-m.main [PR 
116444]' introduced regressions with arm targets that used 'noce' before.
This is because it would approve all noce optimisations without using 
the default cost check. Not sure why this didn't show up in my original 
testing, I suspect you need to test this for a set of specific targets 
like Torbjorn did, thank you for pointing these issues out to me.


Could I ask you to rerun them with this patch? I'll try to do that 
locally too.


Happy to receive reviews, but I'm waiting for Torbjorn and my own 
testing to complete before committing.


When not dealing with the special armv8.1-m.main conditional 
instructions case

make sure it uses the default_noce_conversion_profitable_p call to determine
whether the sequence is cost effective.

gcc/ChangeLog:


PR target/116444
* config/arm/arm.cc (arm_noce_conversion_profitable_p): Call
default_noce_conversion_profitable_p when not dealing with the
armv8.1-m.main conditional instructions special cases.diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 
077c80df4482d168d9694795be68c2eeb8f304d9..fd437f428781673e1d44498d31a47f174e0f57fa
 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -36168,7 +36168,7 @@ arm_noce_conversion_profitable_p (rtx_insn *seq, struct 
noce_if_info *if_info)
 {
   if (!TARGET_COND_ARITH
   || reload_completed)
-return true;
+return default_noce_conversion_profitable_p (seq, if_info);
 
   if (arm_is_v81m_cond_insn (seq))
 return true;

Re: [RFC PATCH] ARM: thumb1: fix bad code emitted when HI_REGS involved

2024-10-04 Thread Christophe Lyon

On Fri, 4 Oct 2024 at 16:59, Siarhei Volkau  wrote:
>
> Hello,
>
> пт, 4 окт. 2024 г. в 16:48, Christophe Lyon :
> >
> > Hi!
> >
> >
> > On Mon, 8 Jul 2024 at 10:57, Siarhei Volkau  wrote:
> > >
> > > ping
> > >
> > > чт, 20 июн. 2024 г. в 12:09, Siarhei Volkau :
> > > >
> > > > This patch deals with consequences but not the root cause though.
> > > >
> > > > There are 5 cases which are subjects to rewrite:
> > > > case #1:
> > > >   mov ip, r1
> > > >   add r2, ip
> > > >   # ip is dead here
> > > > can be rewritten as:
> > > >   adds r2, r1
> >
> > Why replace 'add' with 'adds' ?
> >
> > Thanks,
> >
> > Christophe
> >
>
> Good catch, actually. Silly answer is:
> because there's no alternative without {S} for Lo registers in thumb1.
>
> Correct me if I'm wrong, I don't think that we have to do something
> special with CC reg there because conditional execution instructions
> (thumb1_cbz, cbranchsi4_insn) take care of that.
> See thumb1_final_prescan_insn.
>

Not familiar with how this is handled, but my question is more like:
if the original code is
case #1:
 adds r3,r0  ;; or any instruction which sets CC
  mov ip, r1
  add r2, ip
  # ip is dead here
cbz ...

If you rewrite as
  adds r3,r0
  adds r2, r1
  cbz
then you change CC and it does not get the value expected by cbz.

Am I missing something?

Thanks,

Christophe


> Thanks
>
> Siarhei
>
> > > >
> > > > case #2:
> > > >   add ip, r1
> > > >   mov r1, ip
> > > >   # ip is dead here
> > > > can be rewritten as:
> > > >   add r1, ip
> > > >
> > > > case #3:
> > > >   mov ip, r1
> > > >   add r2, ip
> > > >   add r3, ip
> > > >   # ip is dead here
> > > > can be rewritten as:
> > > >   adds r2, r1
> > > >   adds r3, r1
> > > >
> > > > case #4:
> > > >   mov ip, r1
> > > >   add ip, r2
> > > >   mov r1, ip
> > > > can be rewritten as:
> > > >   adds r1, r2
> > > >   mov  ip, r1 <- might be eliminated too, if ip is dead
> > > >
> > > > case #5 (arbitrary):
> > > >   mov  r1, ip
> > > >   subs r2, r1, r2
> > > >   mov  ip, r2
> > > >   # r1 is dead here
> > > > can be rewritten as:
> > > >   rsbs r1, r2, #0
> > > >   add  ip, r1
> > > >   movs r2, ip <- might be eliminated, if r2 is dead
> > > >
> > > > Speed profit wasn't checked but size changes are the following:
> > > >libgcc:  -132 bytes / -0.25%
> > > >  libc: -1262 bytes / -0.55%
> > > >  libm:  -384 bytes / -0.42%
> > > > libstdc++: -2258 bytes / -0.30%
> > > >
> > > > No tests provided because its hard to force GCC to emit HI_REGS
> > > > in a small and straightforward function.
> > > >
> > > > Signed-off-by: Siarhei Volkau 
> > > > ---
> > > >  gcc/config/arm/thumb1.md | 93 +++-
> > > >  1 file changed, 92 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> > > > index d7074b43f60..9da4af9eccd 100644
> > > > --- a/gcc/config/arm/thumb1.md
> > > > +++ b/gcc/config/arm/thumb1.md
> > > > @@ -2055,4 +2055,95 @@ (define_insn "thumb1_stack_protect_test_insn"
> > > > (set_attr "conds" "clob")
> > > > (set_attr "type" "multiple")]
> > > >  )
> > > > -
> > > > +
> > > > +;; bad code emitted when HI_REGS involved in addition
> > > > +;; subtract also might happen rarely
> > > > +
> > > > +;; case #1:
> > > > +;; mov ip, r1
> > > > +;; add r2, ip # ip is dead after that
> > > > +(define_peephole2
> > > > +  [(set (match_operand:SI 0 "register_operand" "")
> > > > +   (match_operand:SI 1 "register_operand" ""))
> > > > +   (set (match_operand:SI 2 "register_operand" "")
> > > > +   (plus:SI (match_dup 2) (match_dup 0)))]
> > > > +  "TARGET_THUMB1
> > > > +&& peep2_reg_dead_p (2, operands[0])
> > > > +&& REGNO_REG_CLASS (REGNO (operands[0])) == HI_REGS"
> > > > +  [(set (match_dup 2)
> > > > +   (plus:SI (match_dup 2) (match_dup 1)))]
> > > > +  "")
> > > > +
> > > > +;; case #2:
> > > > +;; add ip, r1
> > > > +;; mov r1, ip # ip is dead after that
> > > > +(define_peephole2
> > > > +  [(set (match_operand:SI 0 "register_operand" "")
> > > > +   (plus:SI (match_dup 0) (match_operand:SI 1 "register_operand" 
> > > > "")))
> > > > +   (set (match_dup 1) (match_dup 0))]
> > > > +  "TARGET_THUMB1
> > > > +&& peep2_reg_dead_p (2, operands[0])
> > > > +&& REGNO_REG_CLASS (REGNO (operands[0])) == HI_REGS"
> > > > +  [(set (match_dup 1)
> > > > +   (plus:SI (match_dup 1) (match_dup 0)))]
> > > > +  "")
> > > > +
> > > > +;; case #3:
> > > > +;; mov ip, r1
> > > > +;; add r2, ip
> > > > +;; add r3, ip # ip is dead after that
> > > > +(define_peephole2
> > > > +  [(set (match_operand:SI 0 "register_operand" "")
> > > > +   (match_operand:SI 1 "register_operand" ""))
> > > > +   (set (match_operand:SI 2 "register_operand" "")
> > > > +   (plus:SI (match_dup 2) (match_dup 0)))
> > > > +   (set (match_operand:SI 3 "register_operand" "")
> > > > +   (plus:SI (match_dup 3) (match_dup 0)))]
> > > > +  "TARGET_THUMB1
> > > > +&& peep2_reg_dead_p (3, operands[0

Re: [PATCH] aarch64: Fix bug with max/min (PR116934)

2024-10-04 Thread Richard Sandiford

 writes:
> In ac4cdf5cb43c0b09e81760e2a1902ceebcf1a135, I introduced a bug where
> I put the new unspecs, UNSPEC_COND_SMAX and UNSPEC_COND_SMIN, into the
> wrong iterator.
>
> I should have put new unspecs in SVE_COND_FP_MAXMIN but I put it in
> SVE_COND_FP_BINARY_REG instead. That was incorrect because the
> SVE_COND_FP_MAXMIN iterator is being used for predicated floating-point
> maximum/minimum, not SVE_COND_FP_BINARY_REG.
>
> Also added a testcase to validate the new change.
>
> Regression tested on aarch64-unknown-linux-gnu and found no regressions.
> There are some test cases with "libitm" in their directory names which
> appear in compare_tests output as changed tests but it looks like they
> are in the output just because of changed build directories, like from
> build-patched/aarch64-unknown-linux-gnu/./libitm/* to
> build-pristine/aarch64-unknown-linux-gnu/./libitm/*. I didn't think it
> was a cause of concern and have pushed this for review.
>
> gcc/ChangeLog:
>
>   * config/aarch64/iterators.md: Move UNSPEC_COND_SMAX and
>   UNSPEC_COND_SMIN to correct iterators.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/116934
>   * gcc.target/aarch64/sve2/pr116934.c: New test.

OK, thanks.  I see the only effect of the patch is (rightly) to add
back the constant zero alternatives.

Richard

> ---
>  gcc/config/aarch64/iterators.md  |  8 
>  gcc/testsuite/gcc.target/aarch64/sve2/pr116934.c | 13 +
>  2 files changed, 17 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pr116934.c
>
> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index 0836dee61c9..fcad236eee9 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -3125,9 +3125,7 @@
>  
>  (define_int_iterator SVE_COND_FP_BINARY_REG
>[UNSPEC_COND_FDIV
> -   UNSPEC_COND_FMULX
> -   UNSPEC_COND_SMAX
> -   UNSPEC_COND_SMIN])
> +   UNSPEC_COND_FMULX])
>  
>  (define_int_iterator SVE_COND_FCADD [UNSPEC_COND_FCADD90
>UNSPEC_COND_FCADD270])
> @@ -3135,7 +3133,9 @@
>  (define_int_iterator SVE_COND_FP_MAXMIN [UNSPEC_COND_FMAX
>UNSPEC_COND_FMAXNM
>UNSPEC_COND_FMIN
> -  UNSPEC_COND_FMINNM])
> +  UNSPEC_COND_FMINNM
> +  UNSPEC_COND_SMAX
> +  UNSPEC_COND_SMIN])
>  
>  (define_int_iterator SVE_COND_FP_TERNARY [UNSPEC_COND_FMLA
> UNSPEC_COND_FMLS
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pr116934.c 
> b/gcc/testsuite/gcc.target/aarch64/sve2/pr116934.c
> new file mode 100644
> index 000..94fb96ffa7d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pr116934.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-Ofast -mcpu=neoverse-v2" } */
> +
> +int a;
> +float *b;
> +
> +void foo() {
> +  for (; a; a--, b += 4) {
> +b[0] = b[1] = b[2] = b[2] > 0 ?: 0;
> +if (b[3] < 0)
> +  b[3] = 0;
> +  }
> +}

Re: [PATCH v1] Add -ftime-report-wall

2024-10-04 Thread David Malcolm

On Thu, 2024-10-03 at 11:15 -0700, Andi Kleen wrote:
> > The only consumer I know of for the JSON time report data is in the
> > integration tests I wrote for -fanalyzer, which assumes that all
> > fields
> > are present when printing, and then goes on to use the "user" times
> > for
> > summarizing; see this commit FWIW:
> > https://github.com/davidmalcolm/gcc-analyzer-integration-tests/commit/5420ce968e6eae886e61486555b54fd460e0d35f
> 
> It seems to be broken even without my changes:
> 
> 
> % ./gcc/cc1plus -ftime-report -fdiagnostics-format=sarif-file
> ../tsrc/tramp3d-v4.i
> cc1plus: internal compiler error: Segmentation fault

Oops, thanks; I'm tracking this as
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116978
and working on a fix.

Dave

Re: [PATCH] x86: Disable stack protector for naked functions

2024-10-04 Thread Uros Bizjak

On Fri, Oct 4, 2024 at 2:11 PM H.J. Lu  wrote:
>
> Since naked functions should not enable stack protector, define
> TARGET_STACK_PROTECT_RUNTIME_ENABLED_P to disable stack protector
> for naked functions.
>
> gcc/
>
> PR target/116962
> * config/i386/i386.cc (ix86_stack_protect_runtime_enabled_p): New
> function.
> (TARGET_STACK_PROTECT_RUNTIME_ENABLED_P): New.
>
> gcc/testsuite/
>
> PR target/116962
> * gcc.target/i386/pr116962.c: New file.
>
> OK for master?

OK, also for backports.

Thanks,
Uros.

Re: [to-be-committed][RISC-V] Add splitters to restore condops generation after recent phiopt changes

2024-10-04 Thread Maciej W. Rozycki

On Fri, 4 Oct 2024, Jeff Law wrote:

> >   More importantly may I ask you to review the second paragraph of commit
> > 6c3365e715fa ("RISC-V: Also handle sign extension in branch costing") to
> > see if any of the other issues referred there have also been now sorted
> > and mention that in the change description, possibly with a commit hash
> > reference to Andrew P's recent improvements?  And in particular can the
> > branch costs requested be lowered for gcc.target/riscv/cset-sext.c now?
> So with Andrew's changes those tests are no longer sensitive to branch cost at
> all AFAICT.  I suspect we could just remove the explicit branch cost
> directives completely from the C tests.  They'd still be needed for the RTL
> tests since those are unaffected by Andrew's changes.  Thoughts?

 I expected this to be the case given the nature of Andrew's changes.  So 
my suggestion is to set `-mbranch-cost=1' with the C tests instead, so as 
to have the lack of sensitivity to branch costing covered now.

  Maciej

Re: [to-be-committed][RISC-V] Add splitters to restore condops generation after recent phiopt changes

2024-10-04 Thread Jeff Law





On 10/3/24 5:40 PM, Maciej W. Rozycki wrote:



  More importantly may I ask you to review the second paragraph of commit
6c3365e715fa ("RISC-V: Also handle sign extension in branch costing") to
see if any of the other issues referred there have also been now sorted
and mention that in the change description, possibly with a commit hash
reference to Andrew P's recent improvements?  And in particular can the
branch costs requested be lowered for gcc.target/riscv/cset-sext.c now?
So with Andrew's changes those tests are no longer sensitive to branch 
cost at all AFAICT.  I suspect we could just remove the explicit branch 
cost directives completely from the C tests.  They'd still be needed for 
the RTL tests since those are unaffected by Andrew's changes.  Thoughts?


jeff

[PATCH v3 4/5] openmp, fortran: Add support for map iterators in OpenMP target construct (Fortran)

2024-10-04 Thread Kwok Cheung Yeung

This patch adds support for iterators in the map clause of OpenMP target 
constructs.


The parsing and translation of iterators in the front-end works the same 
way as for the affinity and depend clauses, except for putting the 
iterator into the OMP_CLAUSE_ITERATOR of the clause.


The iterator gimplification needed to be modified slightly to handle 
Fortran. The difference in how ranges work in loops (i.e. the condition 
on the upper bound is <=, rather than < as in C/C++) needs to be 
compensated for when calculating the iteration count and in the 
iteration loop itself.


During Fortran translation of iterators, statements for the side-effects 
of any translated expressions are placed into BLOCK_SUBBLOCKS of the 
block containing the iterator variables (this also occurs with the other 
clauses supporting iterators). However, the previous lowering of 
iterators into Gimple does not appear to do anything with these 
statements, which causes issues if anything in the loop body references 
these side-effects (typically calculation of array boundaries and 
strides). This appears to be a bug that was simply not triggered by 
existing testcases. These statements are now gimplified into the 
innermost loop body.


The libgomp runtime was modified to handle GOMP_MAP_STRUCTs in 
iterators, which can result from the use of derived types (which I used 
in test cases to implement arrays of pointers). libgomp expects a 
GOMP_MAP_STRUCT map to be followed immediately by a number of maps 
corresponding to the fields of the struct, so an iterator 
GOMP_MAP_STRUCT and its fields need to be expanded in a breadth-first 
order, rather than the usual depth-first manner (which would result in 
multiple GOMP_MAP_STRUCTS, followed by multiple instances of the first 
field, then multiples of the second etc.).


The presence of variables in the field offset triggers the unwanted 
creation of GOMP_MAP_STRUCT_UNORD for variable offsets. The offset tree 
is now walked over and if it only contains iterator variables, then the 
offset is treated as constant again (which it is, within the context of 
each iteration of the iterator).From a24aa032c2e23577d4fbc61df6da79345bae8292 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Fri, 4 Oct 2024 15:16:29 +0100
Subject: [PATCH 4/5] openmp, fortran: Add support for map iterators in OpenMP
 target construct (Fortran)

This adds support for iterators in map clauses within OpenMP
'target' constructs in Fortran.

Some special handling for struct field maps has been added to libgomp in
order to handle arrays of derived types.

2024-10-04  Kwok Cheung Yeung  

gcc/fortran/
* dump-parse-tree.cc (show_omp_namelist): Add iterator support for
OMP_LIST_MAP.
* openmp.cc (gfc_free_omp_clauses): Free namespace in namelist for
OMP_LIST_MAP.
(gfc_match_omp_clauses): Parse 'iterator' modifier for 'map' clause.
(resolve_omp_clauses): Resolve iterators for OMP_LIST_MAP.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle iterators in
OMP_LIST_MAP clauses.  Add expressions to iter_block rather than
block.

gcc/
* gimplify.cc (compute_iterator_count): Account for difference in loop
boundaries in Fortran.
(build_iterator_loop): Change upper boundary condition for Fortran.
Insert block statements into innermost loop.
(contains_only_iterator_vars_1): New.
(contains_only_iterator_vars): New.
(extract_base_bit_offset): Add iterator argument.  Do not set
variable_offset if contains_only_iterator_vars is true.
(omp_accumulate_sibling_list): Add iterator argument to
extract_base_bit_offset.
* omp-low.cc (lower_omp_target): Add sorry if iterators used with
deep mapping.
* tree-pretty-print.cc (dump_block_node): Ignore BLOCK_SUBBLOCKS
containing iterator block statements.

gcc/testsuite/
* gfortran.dg/gomp/target-map-iterators-1.f90: New.
* gfortran.dg/gomp/target-map-iterators-2.f90: New.
* gfortran.dg/gomp/target-map-iterators-3.f90: New.

libgomp/
* target.c (kind_to_name): Handle GOMP_MAP_STRUCT and
GOMP_MAP_STRUCT_UNORD.
(gomp_add_map): New.
(gomp_merge_iterator_maps): Expand fields of a struct mapping
breadth-first.
* testsuite/libgomp.fortran/target-map-iterators-1.f90: New.
* testsuite/libgomp.fortran/target-map-iterators-2.f90: New.
* testsuite/libgomp.fortran/target-map-iterators-3.f90: New.
---
 gcc/fortran/dump-parse-tree.cc|  9 +-
 gcc/fortran/openmp.cc | 35 ++--
 gcc/fortran/trans-openmp.cc   | 71 
 gcc/gimplify.cc   | 76 ++---
 gcc/omp-low.cc|  5 ++
 .../gomp/target-map-iterators-1.f90   | 26 ++
 .../gomp/target-map-iterators-2.f90   | 27

[PATCH v3 5/5] openmp, fortran: Add support for iterators in OpenMP 'target update' constructs (Fortran)

2024-10-04 Thread Kwok Cheung Yeung

This patch adds parsing and translation of the 'to' and 'from' clauses 
for the 'target update' construct in Fortran.From da8ab0cb38d2bc347cf902ec417b0397c28e24e2 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Fri, 4 Oct 2024 15:16:38 +0100
Subject: [PATCH 5/5] openmp, fortran: Add support for iterators in OpenMP
 'target update' constructs (Fortran)

This adds Fortran support for iterators in 'to' and 'from' clauses in the
'target update' OpenMP directive.

2024-10-04  Kwok Cheung Yeung  

gcc/fortran/
* dump-parse-tree.cc (show_omp_namelist): Add iterator support for
OMP_LIST_TO and OMP_LIST_FROM.
* openmp.cc (gfc_free_omp_clauses): Free namespace for OMP_LIST_TO
and OMP_LIST_FROM.
(gfc_match_motion_var_list): Parse 'iterator' modifier.
(resolve_omp_clauses): Resolve iterators for OMP_LIST_TO and
OMP_LIST_FROM.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle iterators in
OMP_LIST_TO and OMP_LIST_FROM clauses.  Add expressions to
iter_block rather than block.

gcc/testsuite/
* gfortran.dg/gomp/target-update-iterators-1.f90: New.
* gfortran.dg/gomp/target-update-iterators-2.f90: New.
* gfortran.dg/gomp/target-update-iterators-3.f90: New.

libgomp/
* testsuite/libgomp.fortran/target-update-iterators-1.f90: New.
* testsuite/libgomp.fortran/target-update-iterators-2.f90: New.
* testsuite/libgomp.fortran/target-update-iterators-3.f90: New.
---
 gcc/fortran/dump-parse-tree.cc|  7 +-
 gcc/fortran/openmp.cc | 62 +--
 gcc/fortran/trans-openmp.cc   | 50 ++--
 .../gomp/target-update-iterators-1.f90| 25 ++
 .../gomp/target-update-iterators-2.f90| 22 ++
 .../gomp/target-update-iterators-3.f90| 23 ++
 .../target-update-iterators-1.f90 | 68 
 .../target-update-iterators-2.f90 | 63 +++
 .../target-update-iterators-3.f90 | 78 +++
 9 files changed, 386 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/target-update-iterators-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/target-update-iterators-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/target-update-iterators-3.f90
 create mode 100644 
libgomp/testsuite/libgomp.fortran/target-update-iterators-1.f90
 create mode 100644 
libgomp/testsuite/libgomp.fortran/target-update-iterators-2.f90
 create mode 100644 
libgomp/testsuite/libgomp.fortran/target-update-iterators-3.f90

diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index 3ee6ed1ea7f..0a2d546d3fe 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -1360,7 +1360,8 @@ show_omp_namelist (int list_type, gfc_omp_namelist *n)
 {
   gfc_current_ns = ns_curr;
   if (list_type == OMP_LIST_AFFINITY || list_type == OMP_LIST_DEPEND
- || list_type == OMP_LIST_MAP)
+ || list_type == OMP_LIST_MAP
+ || list_type == OMP_LIST_TO || list_type == OMP_LIST_FROM)
{
  gfc_current_ns = n->u2.ns ? n->u2.ns : ns_curr;
  if (n->u2.ns != ns_iter)
@@ -1376,6 +1377,10 @@ show_omp_namelist (int list_type, gfc_omp_namelist *n)
fputs ("DEPEND (", dumpfile);
  else if (list_type == OMP_LIST_MAP)
fputs ("MAP (", dumpfile);
+ else if (list_type == OMP_LIST_TO)
+   fputs ("TO (", dumpfile);
+ else if (list_type == OMP_LIST_FROM)
+   fputs ("FROM (", dumpfile);
  else
gcc_unreachable ();
}
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 3003ba605cf..c765d5814a7 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -194,7 +194,8 @@ gfc_free_omp_clauses (gfc_omp_clauses *c)
   for (i = 0; i < OMP_LIST_NUM; i++)
 gfc_free_omp_namelist (c->lists[i],
   i == OMP_LIST_AFFINITY || i == OMP_LIST_DEPEND
-  || i == OMP_LIST_MAP,
+  || i == OMP_LIST_MAP
+  || i == OMP_LIST_TO || i == OMP_LIST_FROM,
   i == OMP_LIST_ALLOCATE,
   i == OMP_LIST_USES_ALLOCATORS,
   i == OMP_LIST_INIT);
@@ -1368,17 +1369,65 @@ gfc_match_motion_var_list (const char *str, 
gfc_omp_namelist **list,
   if (m != MATCH_YES)
 return m;
 
-  match m_present = gfc_match (" present : ");
+  gfc_namespace *ns_iter = NULL, *ns_curr = gfc_current_ns;
+  int present_modifier = 0, iterator_modifier = 0;
+  locus present_locus = gfc_current_locus, iterator_locus = gfc_current_locus;
 
-  m = gfc_match_omp_variable_list ("", list, false, NULL, h

RE: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE instruction

2024-10-04 Thread Tamar Christina

> -Original Message-
> From: Kyrylo Tkachov 
> Sent: Thursday, October 3, 2024 4:45 PM
> To: Richard Sandiford 
> Cc: Soumya AR ; Tamar Christina
> ; gcc-patches@gcc.gnu.org; Richard Earnshaw
> ; Jennifer Schmitz ;
> Pengxuan Zheng (QUIC) 
> Subject: Re: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE 
> instruction
> 
> 
> 
> > On 3 Oct 2024, at 16:41, Richard Sandiford 
> wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > Soumya AR  writes:
> >> From 7fafcb5e0174c56205ec05406c9a412196ae93d3 Mon Sep 17 00:00:00
> 2001
> >> From: Soumya AR 
> >> Date: Thu, 3 Oct 2024 11:53:07 +0530
> >> Subject: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE 
> >> instruction
> >>
> >> This patch uses the FSCALE instruction provided by SVE to implement the
> >> standard ldexp family of functions.
> >>
> >> Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the
> >> following code:
> >>
> >> float
> >> test_ldexpf (float x, int i)
> >> {
> >>  return __builtin_ldexpf (x, i);
> >> }
> >>
> >> double
> >> test_ldexp (double x, int i)
> >> {
> >>  return __builtin_ldexp(x, i);
> >> }
> >>
> >> GCC Output:
> >>
> >> test_ldexpf:
> >>  b ldexpf
> >>
> >> test_ldexp:
> >>  b ldexp
> >>
> >> Since SVE has support for an FSCALE instruction, we can use this to process
> >> scalar floats by moving them to a vector register and performing an fscale 
> >> call,
> >> similar to how LLVM tackles an ldexp builtin as well.
> >>
> >> New Output:
> >>
> >> test_ldexpf:
> >>  fmov s31, w0
> >>  ptrue p7.b, all
> >>  fscale z0.s, p7/m, z0.s, z31.s
> >>  ret
> >>
> >> test_ldexp:
> >>  sxtw x0, w0
> >>  ptrue p7.b, all
> >>  fmov d31, x0
> >>  fscale z0.d, p7/m, z0.d, z31.d
> >>  ret
> >>
> >> The patch was bootstrapped and regtested on aarch64-linux-gnu, no
> regression.
> >> OK for mainline?
> >
> > Could we also use the .H form for __builtin_ldexpf16?
> >
> > I suppose:
> >
> >> @@ -2286,7 +2289,8 @@
> >>   (VNx8DI "VNx2BI") (VNx8DF "VNx2BI")
> >>   (V8QI "VNx8BI") (V16QI "VNx16BI")
> >>   (V4HI "VNx4BI") (V8HI "VNx8BI") (V2SI "VNx2BI")
> >> -  (V4SI "VNx4BI") (V2DI "VNx2BI")])
> >> +  (V4SI "VNx4BI") (V2DI "VNx2BI")
> >> +  (SF "VNx4BI") (DF "VNx2BI")])
> >
> > ...this again raises the question what we should do for predicate
> > modes when the data mode isn't a natural SVE mode.  That came up
> > recently in relation to V1DI in the popcount patch, and for reductions
> > in the ANDV etc. patch.
> 
> Thanks you for enumerating the options below.
> 
> >
> > Three obvious options are:
> >
> > (1) Use the nearest SVE mode with a full ptrue (as the patch does).
> > (2) Use the nearest SVE mode with a 128-bit ptrue.
> > (3) Add new modes V16BI, V8BI, V4BI, V2BI, and V1BI.  (And possibly BI
> >for scalars.)
> 
> Just to be clear, what do you mean by “nearest SVE mode” in this context?
> 

I think he means the smallest SVE mode that has the same unit size as the Adv. 
SIMD register.
I think the idea is that we're consistent with the modes used so we don't end 
up using e.g.
VNx16QI and VNx8QI etc for e.g. b0.

> 
> >
> > The problem with (1) is that, as Tamar pointed out, it doesn't work
> > properly with reductions.  It also isn't safe for this patch (without
> > fast-mathy options) because of FP exceptions.  Although writing to
> > a scalar FP register zeros the upper bits, and so gives a "safe" value
> > for this particular operation, nothing guarantees that all SF and DF
> > values have this zero-extended form.  They could come from subregs of
> > Advanced SIMD or SVE vectors.  The ABI also doesn't guarantee that
> > incoming SF and DF values are zero-extended.
> >
> > (2) would be safe, but would mean that we continue to have an nunits
> > disagreement between the data mode and the predicate mode.  This would
> > prevent operations being described in generic RTL in future.
> >
> > (3) is probably the cleanest representional approach, but has the problem
> > that we cannot store a fixed-length portion of an SVE predicate.
> > We would have to load and store the modes via other register classes.
> > (With PMOV we could use scalar FP loads and stores, but otherwise
> > we'd probably need secondary memory reloads.)  That said, we could
> > tell the RA to spill in a full predicate mode, so this shouldn't be
> > a problem unless the modes somehow get exposed to gimple or frontends.
> >
> > WDYT?
> 
> IMO option (2) sounds the more appealing at this stage. To me it feels
> conceptually straightforward as we are using a SVE operation clamped at
> 128 bits to “emulate” what should have been an 128-bit fixed-width mode
> operation.
> It also feels that, given the complexity of (3) and introducing new modes,
> we should go for (3) only if/when we do decide to implement these ops with
> generic RTL.

2 i

[PATCH] RISC-V/libgcc: Fix incorrect .cfi_offset for saving ra in __riscv_save_[0-3] on ilp32e.

2024-10-04 Thread Tsung Chun Lin




0001-RISC-V-libgcc-Fix-incorrect-.cfi_offset-for-saving-r.patch
Description: Binary data

Fwd: [patch, fortran] Implement maxloc and minloc for unsigned

2024-10-04 Thread Thomas Koenig


Hello world,

the original messages seems to have been rejected because the patch was
too big. The patch (wich was not rejected for fortran@) can be found at

https://gcc.gnu.org/pipermail/fortran/2024-October/061127.html

 Weitergeleitete Nachricht 
Betreff: [patch, fortran] Implement maxloc and minloc for unsigned
Datum: Fri, 4 Oct 2024 09:54:37 +0200
Von: Thomas Koenig 
An: fort...@gcc.gnu.org , gcc-patches 



Hello world,

please find attached the patch for implementing MAXLOC and MINLOC
for unsigned.

The patch is rather lengthy, but mostly due to combinatorial explosion
with the different return values.

Next time we update the ABI, we should treat MAXLOC and MINLOC like we
already do for FINDLOC - have one version in the library, and convert
in the front end when the user requests a different integer kind.

This finishes the support of all reasonable intrinsics for UNSIGNED
(or so I think - if anybody spots something reasonable, just let me
know).

The next step would then be ISO_C_BINDING; clean interfaces to C
is one of the main reason why people want UNSIGNED in Fortran.

Regression-tested. OK for trunk?

Best regards

Thomas

gcc/fortran/ChangeLog:

* check.cc (gfc_check_minloc_maxloc): Handle BT_UNSIGNED.
* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Likewise.
* gfortran.texi: Document MAXLOC and MINLOC for UNSIGNED.

libgfortran/ChangeLog:

* Makefile.am: Add files for unsigned MINLOC and MAXLOC.
* Makefile.in: Regenerated.
* gfortran.map: Add files for unsigned MINLOC and MAXLOC.
* generated/maxloc0_16_m1.c: New file.
* generated/maxloc0_16_m16.c: New file.
* generated/maxloc0_16_m2.c: New file.
* generated/maxloc0_16_m4.c: New file.
* generated/maxloc0_16_m8.c: New file.
* generated/maxloc0_4_m1.c: New file.
* generated/maxloc0_4_m16.c: New file.
* generated/maxloc0_4_m2.c: New file.
* generated/maxloc0_4_m4.c: New file.
* generated/maxloc0_4_m8.c: New file.
* generated/maxloc0_8_m1.c: New file.
* generated/maxloc0_8_m16.c: New file.
* generated/maxloc0_8_m2.c: New file.
* generated/maxloc0_8_m4.c: New file.
* generated/maxloc0_8_m8.c: New file.
* generated/maxloc1_16_m1.c: New file.
* generated/maxloc1_16_m2.c: New file.
* generated/maxloc1_16_m4.c: New file.
* generated/maxloc1_16_m8.c: New file.
* generated/maxloc1_4_m1.c: New file.
* generated/maxloc1_4_m16.c: New file.
* generated/maxloc1_4_m2.c: New file.
* generated/maxloc1_4_m4.c: New file.
* generated/maxloc1_4_m8.c: New file.
* generated/maxloc1_8_m1.c: New file.
* generated/maxloc1_8_m16.c: New file.
* generated/maxloc1_8_m2.c: New file.
* generated/maxloc1_8_m4.c: New file.
* generated/maxloc1_8_m8.c: New file.
* generated/minloc0_16_m1.c: New file.
* generated/minloc0_16_m16.c: New file.
* generated/minloc0_16_m2.c: New file.
* generated/minloc0_16_m4.c: New file.
* generated/minloc0_16_m8.c: New file.
* generated/minloc0_4_m1.c: New file.
* generated/minloc0_4_m16.c: New file.
* generated/minloc0_4_m2.c: New file.
* generated/minloc0_4_m4.c: New file.
* generated/minloc0_4_m8.c: New file.
* generated/minloc0_8_m1.c: New file.
* generated/minloc0_8_m16.c: New file.
* generated/minloc0_8_m2.c: New file.
* generated/minloc0_8_m4.c: New file.
* generated/minloc0_8_m8.c: New file.
* generated/minloc1_16_m1.c: New file.
* generated/minloc1_16_m16.c: New file.
* generated/minloc1_16_m2.c: New file.
* generated/minloc1_16_m4.c: New file.
* generated/minloc1_16_m8.c: New file.
* generated/minloc1_4_m1.c: New file.
* generated/minloc1_4_m16.c: New file.
* generated/minloc1_4_m2.c: New file.
* generated/minloc1_4_m4.c: New file.
* generated/minloc1_4_m8.c: New file.
* generated/minloc1_8_m1.c: New file.
* generated/minloc1_8_m16.c: New file.
* generated/minloc1_8_m2.c: New file.
* generated/minloc1_8_m4.c: New file.
* generated/minloc1_8_m8.c: New file.

[PATCH] libstdc++: Test 17_intro/names.cc with -D_FORTIFY_SOURCE=2 [PR116210]

2024-10-04 Thread Jonathan Wakely

This doesn't really belong in our testsuite, because the sole purpose of
the new test is to find bugs in the Glibc wrappers (like the one linked
below). But maybe it's a kindness to do it in our testsuite, because we
already have this test in place, and one Glibc bug was already found
thanks to Sam running the existing test with _FORTIFY_SOURCE defined.

Should we do this?

-- >8 --

Add a new testcase that repeats 17_intro/names.cc but with
_FORTIFY_SOURCE defined, to find problems in Glibc fortify wrappers like
https://sourceware.org/bugzilla/show_bug.cgi?id=32052 (which is fixed
now).

libstdc++-v3/ChangeLog:

PR libstdc++/116210
* testsuite/17_intro/names.cc (sz): Undef for versions of Glibc
that use it in the fortify wrappers.
* testsuite/17_intro/names_fortify.cc: New test.
---
 libstdc++-v3/testsuite/17_intro/names.cc | 7 +++
 libstdc++-v3/testsuite/17_intro/names_fortify.cc | 6 ++
 2 files changed, 13 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/17_intro/names_fortify.cc

diff --git a/libstdc++-v3/testsuite/17_intro/names.cc 
b/libstdc++-v3/testsuite/17_intro/names.cc
index 6b9a3639aad..bbf45b93dee 100644
--- a/libstdc++-v3/testsuite/17_intro/names.cc
+++ b/libstdc++-v3/testsuite/17_intro/names.cc
@@ -377,4 +377,11 @@
 #undef y
 #endif
 
+#if defined __GLIBC_PREREQ && defined _FORTIFY_SOURCE
+# if __GLIBC_PREREQ(2,35) && ! __GLIBC_PREREQ(2,41)
+// https://sourceware.org/bugzilla/show_bug.cgi?id=32052
+#  undef sz
+# endif
+#endif
+
 #include 
diff --git a/libstdc++-v3/testsuite/17_intro/names_fortify.cc 
b/libstdc++-v3/testsuite/17_intro/names_fortify.cc
new file mode 100644
index 000..c975412074b
--- /dev/null
+++ b/libstdc++-v3/testsuite/17_intro/names_fortify.cc
@@ -0,0 +1,6 @@
+// { dg-do compile { target *-*-linux* } }
+// { dg-add-options no_pch }
+
+#define _FORTIFY_SOURCE 2
+// Now we can define the macros to poison uses of non-reserved names:
+#include "names.cc"
-- 
2.46.1

[patch,avr] Implement TARGET_FLOATN_MODE

2024-10-04 Thread Georg-Johann Lay


This patch implements TARGET_FLOATN_MODE which maps
_Float32[x] to SFmode and _Float64[x] to DFmode.

There is currently no library support for extended float types,
but these settings are more reasonable for avr (and they make
more tests pass).

Ok for trunk?

Johann

--

AVR: Implement TARGET_FLOATN_MODE.

gcc/
* config/avr/avr.cc (avr_floatn_mode): New static function.
(TARGET_FLOATN_MODE): New define.diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 92013c3845d..b73c251b64b 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -15473,6 +15473,24 @@ avr_c_mode_for_floating_type (tree_index ti)
 }
 
 
+/* Implement `TARGET_FLOATN_MODE'.  */
+
+static opt_scalar_float_mode
+avr_floatn_mode (int n, bool /*extended*/)
+{
+  if (n == 32)
+return SFmode;
+
+  // Notice that -m[long-]double= just tells which library (AVR-LibC
+  // or libgcc/libf7) is providing symbols like sin.  DFmode support
+  // is provided by libf7 no matter what.
+  if (n == 64)
+return DFmode;
+
+  return opt_scalar_float_mode ();
+}
+
+
 /* Worker function for `FLOAT_LIB_COMPARE_RETURNS_BOOL'.  */
 
 bool
@@ -15705,6 +15723,9 @@ avr_use_lra_p ()
 #undef TARGET_C_MODE_FOR_FLOATING_TYPE
 #define TARGET_C_MODE_FOR_FLOATING_TYPE avr_c_mode_for_floating_type
 
+#undef  TARGET_FLOATN_MODE
+#define TARGET_FLOATN_MODE avr_floatn_mode
+
 gcc_target targetm = TARGET_INITIALIZER;

Re: [RFC PATCH] ARM: thumb1: fix bad code emitted when HI_REGS involved

2024-10-04 Thread Christophe Lyon

Hi!


On Mon, 8 Jul 2024 at 10:57, Siarhei Volkau  wrote:
>
> ping
>
> чт, 20 июн. 2024 г. в 12:09, Siarhei Volkau :
> >
> > This patch deals with consequences but not the root cause though.
> >
> > There are 5 cases which are subjects to rewrite:
> > case #1:
> >   mov ip, r1
> >   add r2, ip
> >   # ip is dead here
> > can be rewritten as:
> >   adds r2, r1

Why replace 'add' with 'adds' ?

Thanks,

Christophe

> >
> > case #2:
> >   add ip, r1
> >   mov r1, ip
> >   # ip is dead here
> > can be rewritten as:
> >   add r1, ip
> >
> > case #3:
> >   mov ip, r1
> >   add r2, ip
> >   add r3, ip
> >   # ip is dead here
> > can be rewritten as:
> >   adds r2, r1
> >   adds r3, r1
> >
> > case #4:
> >   mov ip, r1
> >   add ip, r2
> >   mov r1, ip
> > can be rewritten as:
> >   adds r1, r2
> >   mov  ip, r1 <- might be eliminated too, if ip is dead
> >
> > case #5 (arbitrary):
> >   mov  r1, ip
> >   subs r2, r1, r2
> >   mov  ip, r2
> >   # r1 is dead here
> > can be rewritten as:
> >   rsbs r1, r2, #0
> >   add  ip, r1
> >   movs r2, ip <- might be eliminated, if r2 is dead
> >
> > Speed profit wasn't checked but size changes are the following:
> >libgcc:  -132 bytes / -0.25%
> >  libc: -1262 bytes / -0.55%
> >  libm:  -384 bytes / -0.42%
> > libstdc++: -2258 bytes / -0.30%
> >
> > No tests provided because its hard to force GCC to emit HI_REGS
> > in a small and straightforward function.
> >
> > Signed-off-by: Siarhei Volkau 
> > ---
> >  gcc/config/arm/thumb1.md | 93 +++-
> >  1 file changed, 92 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> > index d7074b43f60..9da4af9eccd 100644
> > --- a/gcc/config/arm/thumb1.md
> > +++ b/gcc/config/arm/thumb1.md
> > @@ -2055,4 +2055,95 @@ (define_insn "thumb1_stack_protect_test_insn"
> > (set_attr "conds" "clob")
> > (set_attr "type" "multiple")]
> >  )
> > -
> > +
> > +;; bad code emitted when HI_REGS involved in addition
> > +;; subtract also might happen rarely
> > +
> > +;; case #1:
> > +;; mov ip, r1
> > +;; add r2, ip # ip is dead after that
> > +(define_peephole2
> > +  [(set (match_operand:SI 0 "register_operand" "")
> > +   (match_operand:SI 1 "register_operand" ""))
> > +   (set (match_operand:SI 2 "register_operand" "")
> > +   (plus:SI (match_dup 2) (match_dup 0)))]
> > +  "TARGET_THUMB1
> > +&& peep2_reg_dead_p (2, operands[0])
> > +&& REGNO_REG_CLASS (REGNO (operands[0])) == HI_REGS"
> > +  [(set (match_dup 2)
> > +   (plus:SI (match_dup 2) (match_dup 1)))]
> > +  "")
> > +
> > +;; case #2:
> > +;; add ip, r1
> > +;; mov r1, ip # ip is dead after that
> > +(define_peephole2
> > +  [(set (match_operand:SI 0 "register_operand" "")
> > +   (plus:SI (match_dup 0) (match_operand:SI 1 "register_operand" "")))
> > +   (set (match_dup 1) (match_dup 0))]
> > +  "TARGET_THUMB1
> > +&& peep2_reg_dead_p (2, operands[0])
> > +&& REGNO_REG_CLASS (REGNO (operands[0])) == HI_REGS"
> > +  [(set (match_dup 1)
> > +   (plus:SI (match_dup 1) (match_dup 0)))]
> > +  "")
> > +
> > +;; case #3:
> > +;; mov ip, r1
> > +;; add r2, ip
> > +;; add r3, ip # ip is dead after that
> > +(define_peephole2
> > +  [(set (match_operand:SI 0 "register_operand" "")
> > +   (match_operand:SI 1 "register_operand" ""))
> > +   (set (match_operand:SI 2 "register_operand" "")
> > +   (plus:SI (match_dup 2) (match_dup 0)))
> > +   (set (match_operand:SI 3 "register_operand" "")
> > +   (plus:SI (match_dup 3) (match_dup 0)))]
> > +  "TARGET_THUMB1
> > +&& peep2_reg_dead_p (3, operands[0])
> > +&& REGNO_REG_CLASS (REGNO (operands[0])) == HI_REGS"
> > +  [(set (match_dup 2)
> > +   (plus:SI (match_dup 2) (match_dup 1)))
> > +   (set (match_dup 3)
> > +   (plus:SI (match_dup 3) (match_dup 1)))]
> > +  "")
> > +
> > +;; case #4:
> > +;; mov ip, r1
> > +;; add ip, r2
> > +;; mov r1, ip
> > +(define_peephole2
> > +  [(set (match_operand:SI 0 "register_operand" "")
> > +   (match_operand:SI 1 "register_operand" ""))
> > +   (set (match_dup 0)
> > +   (plus:SI (match_dup 0) (match_operand:SI 2 "register_operand" "")))
> > +   (set (match_dup 1)
> > +   (match_dup 0))]
> > +  "TARGET_THUMB1
> > +&& REGNO_REG_CLASS (REGNO (operands[0])) == HI_REGS"
> > +  [(set (match_dup 1)
> > +   (plus:SI (match_dup 1) (match_dup 2)))
> > +   (set (match_dup 0) (match_dup 1))]  ;; likely will be eliminated
> > +  "")
> > +
> > +;; case #5:
> > +;; mov  r1, ip
> > +;; subs r2, r1, r2
> > +;; mov  ip, r2  # r1 is dead after
> > +(define_peephole2
> > +  [(set (match_operand:SI 1 "register_operand" "")
> > +   (match_operand:SI 0 "register_operand" ""))
> > +   (set (match_operand:SI 2 "register_operand" "")
> > +(minus:SI (match_dup 1) (match_dup 2)))
> > +   (set (match_dup 0)
> > +   (match_dup 2))]
> > +  "TARGET_THUMB1
> > +&& peep2_reg_dead_p (3, operands[1])
> > +&& REGNO_REG_CLASS (REGNO (operan

Re: [PATCH] libstdc++: Test 17_intro/names.cc with -D_FORTIFY_SOURCE=2 [PR116210]

2024-10-04 Thread Siddhesh Poyarekar


On 2024-10-04 07:52, Jonathan Wakely wrote:

This doesn't really belong in our testsuite, because the sole purpose of
the new test is to find bugs in the Glibc wrappers (like the one linked
below). But maybe it's a kindness to do it in our testsuite, because we
already have this test in place, and one Glibc bug was already found
thanks to Sam running the existing test with _FORTIFY_SOURCE defined.

Should we do this?

-- >8 --

Add a new testcase that repeats 17_intro/names.cc but with
_FORTIFY_SOURCE defined, to find problems in Glibc fortify wrappers like
https://sourceware.org/bugzilla/show_bug.cgi?id=32052 (which is fixed
now).

libstdc++-v3/ChangeLog:

PR libstdc++/116210
* testsuite/17_intro/names.cc (sz): Undef for versions of Glibc
that use it in the fortify wrappers.
* testsuite/17_intro/names_fortify.cc: New test.
---
  libstdc++-v3/testsuite/17_intro/names.cc | 7 +++
  libstdc++-v3/testsuite/17_intro/names_fortify.cc | 6 ++
  2 files changed, 13 insertions(+)
  create mode 100644 libstdc++-v3/testsuite/17_intro/names_fortify.cc

diff --git a/libstdc++-v3/testsuite/17_intro/names.cc 
b/libstdc++-v3/testsuite/17_intro/names.cc
index 6b9a3639aad..bbf45b93dee 100644
--- a/libstdc++-v3/testsuite/17_intro/names.cc
+++ b/libstdc++-v3/testsuite/17_intro/names.cc
@@ -377,4 +377,11 @@
  #undef y
  #endif
  
+#if defined __GLIBC_PREREQ && defined _FORTIFY_SOURCE

+# if __GLIBC_PREREQ(2,35) && ! __GLIBC_PREREQ(2,41)
+// https://sourceware.org/bugzilla/show_bug.cgi?id=32052
+#  undef sz
+# endif
+#endif


We've backported the fix to stable branches, so the version check isn't 
really that reliable.


Sid

Re: [PATCH] RISC-V/libgcc: Fix incorrect .cfi_offset for saving ra in __riscv_save_[0-3] on ilp32e.

2024-10-04 Thread Jeff Law





On 10/4/24 1:23 AM, Tsung Chun Lin wrote:


0001-RISC-V-libgcc-Fix-incorrect-.cfi_offset-for-saving-r.patch

 From 8b3c5ebe8aacbcc4ddf1be8dea9a555e7e1bcc39 Mon Sep 17 00:00:00 2001
From: Jim Lin
Date: Fri, 4 Oct 2024 14:48:12 +0800
Subject: [PATCH] RISC-V/libgcc: Fix incorrect .cfi_offset for saving ra in
  __riscv_save_[0-3] on ilp32e.

libgcc/ChangeLog:

 * config/riscv/save-restore.S: Fix .cfi_offset for saving ra in
   __riscv_save_[0-3] on ilp32e.
Thanks.  Looks correct to me and I've pushed it to the trunk.  I checked 
all the other .cfi_offsets and they looked correct to me.


Curious, how did you find this (and the other error you fixed recently)?

Jeff

Re: [PATCH 1/2] gcc: make Valgrind errors fatal during bootstrap

2024-10-04 Thread Jeff Law





On 10/2/24 8:39 PM, Sam James wrote:

Valgrind doesn't error out by default which means bootstrap issues like
in PR116945 can easily be missed: pass --exit-errorcode=1 to handle this.

While here, also set --trace-children=yes to cover child processes
of tools invoked during the build.

Note that this only handles tools invoke during the build, it doesn't
cover everything that --enable-checking=valgrind does.

gcc/ChangeLog:
PR other/116945
PR other/116947

* configure: Regenerate.
* configure.ac (valgrind_cmd): Pass additional options.
But is this going to cause all bootstraps with Ada to fail?  That's how 
I read 116945 which was closed as WONTFIX.  Or am I mis-interpreting 
that BZ and its interaction with this patch?


jeff

Re: [PATCH 3/3] gimple: Add gimple_with_undefined_signed_overflow and use it [PR111276]

2024-10-04 Thread Richard Biener

On Thu, Oct 3, 2024 at 6:09 PM Andrew Pinski  wrote:
>
> While looking into the ifcombine, I noticed that rewrite_to_defined_overflow
> was rewriting already defined code. In the previous attempt at fixing this,
> the review mentioned we should not be calling rewrite_to_defined_overflow
> in those cases. The places which called rewrite_to_defined_overflow didn't
> always check the lhs of the assignment. This fixes the problem by
> introducing a helper function which is to be used before calling
> rewrite_to_defined_overflow.
>
> Bootstrapped and tested on x86_64-linux-gnu.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> PR tree-optimization/111276
> * gimple-fold.cc (arith_code_with_undefined_signed_overflow): Make 
> static.
> (gimple_with_undefined_signed_overflow): New function.
> * gimple-fold.h (arith_code_with_undefined_signed_overflow): Remove.
> (gimple_with_undefined_signed_overflow): Add declaration.
> * tree-if-conv.cc (if_convertible_gimple_assign_stmt_p): Use
> gimple_with_undefined_signed_overflow instead of manually
> checking lhs and the code of the stmt.
> (predicate_statements): Likewise.
> * tree-ssa-ifcombine.cc (pass_tree_ifcombine::execute): Likewise.
> * tree-ssa-loop-im.cc (move_computations_worker): Likewise.
> * tree-ssa-reassoc.cc (update_range_test): Likewise. Reformat.
> * tree-scalar-evolution.cc (final_value_replacement_loop): Use
> gimple_with_undefined_signed_overflow instead of
> arith_code_with_undefined_signed_overflow.
> * tree-ssa-loop-split.cc (split_loop): Likewise.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/gimple-fold.cc   | 26 ++-
>  gcc/gimple-fold.h|  2 +-
>  gcc/tree-if-conv.cc  | 16 +++
>  gcc/tree-scalar-evolution.cc |  5 +
>  gcc/tree-ssa-ifcombine.cc| 10 ++---
>  gcc/tree-ssa-loop-im.cc  |  6 +-
>  gcc/tree-ssa-loop-split.cc   |  5 +
>  gcc/tree-ssa-reassoc.cc  | 40 +++-
>  8 files changed, 50 insertions(+), 60 deletions(-)
>
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index 942de7720fd..0b49d6754e2 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -8991,7 +8991,7 @@ gimple_fold_indirect_ref (tree t)
> integer types involves undefined behavior on overflow and the
> operation can be expressed with unsigned arithmetic.  */
>
> -bool
> +static bool
>  arith_code_with_undefined_signed_overflow (tree_code code)
>  {
>switch (code)
> @@ -9008,6 +9008,30 @@ arith_code_with_undefined_signed_overflow (tree_code 
> code)
>  }
>  }
>
> +/* Return true if STMT has an operation that operates on a signed
> +   integer types involves undefined behavior on overflow and the
> +   operation can be expressed with unsigned arithmetic.  */
> +
> +bool
> +gimple_with_undefined_signed_overflow (gimple *stmt)
> +{
> +  if (!is_gimple_assign (stmt))
> +return false;
> +  tree lhs = gimple_assign_lhs (stmt);
> +  if (!lhs)
> +return false;
> +  tree lhs_type = TREE_TYPE (lhs);
> +  if (!INTEGRAL_TYPE_P (lhs_type)
> +  && !POINTER_TYPE_P (lhs_type))
> +return false;
> +  if (!TYPE_OVERFLOW_UNDEFINED (lhs_type))
> +return false;
> +  if (!arith_code_with_undefined_signed_overflow
> +   (gimple_assign_rhs_code (stmt)))
> +return false;
> +  return true;
> +}
> +
>  /* Rewrite STMT, an assignment with a signed integer or pointer arithmetic
> operation that can be transformed to unsigned arithmetic by converting
> its operand, carrying out the operation in the corresponding unsigned
> diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
> index dc709d515a9..165325392c9 100644
> --- a/gcc/gimple-fold.h
> +++ b/gcc/gimple-fold.h
> @@ -59,7 +59,7 @@ extern tree gimple_get_virt_method_for_vtable 
> (HOST_WIDE_INT, tree,
>  extern tree gimple_fold_indirect_ref (tree);
>  extern bool gimple_fold_builtin_sprintf (gimple_stmt_iterator *);
>  extern bool gimple_fold_builtin_snprintf (gimple_stmt_iterator *);
> -extern bool arith_code_with_undefined_signed_overflow (tree_code);
> +extern bool gimple_with_undefined_signed_overflow (gimple *);
>  extern void rewrite_to_defined_overflow (gimple_stmt_iterator *);
>  extern gimple_seq rewrite_to_defined_overflow (gimple *);
>  extern void replace_call_with_value (gimple_stmt_iterator *, tree);
> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
> index 3b04d1e8d34..f5aa6c04fc9 100644
> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -1067,11 +1067,7 @@ if_convertible_gimple_assign_stmt_p (gimple *stmt,
> fprintf (dump_file, "tree could trap...\n");
>return false;
>  }
> -  else if ((INTEGRAL_TYPE_P (TREE_TYPE (lhs))
> -   || POINTER_TYPE_P (TREE_TYPE (lhs)))
> -  && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (lhs))
> -  && arith_code_with_undefined_signed_overflow
> -

Re: [PATCH 2/3] cfgexpand: Handle scope conflicts better [PR111422]

2024-10-04 Thread Richard Biener

On Thu, Oct 3, 2024 at 6:09 PM Andrew Pinski  wrote:
>
> After fixing loop-im to do the correct overflow rewriting
> for pointer types too. We end up with code like:
> ```
>   _9 = (unsigned long) &g;
>   _84 = _9 + 18446744073709551615;
>   _11 = _42 + _84;
>   _44 = (signed char *) _11;
> ...
>   *_44 = 10;
>   g ={v} {CLOBBER(eos)};
> ...
>   n[0] = &f;
>   *_44 = 8;
>   g ={v} {CLOBBER(eos)};
> ```
> Which was not being recongized by the scope conflicts code.
> This was because it only handled one level walk backs rather than multiple 
> ones.
> This fixes it by using a work_list to avoid huge recursion and a visited 
> bitmape to avoid
> going into an infinite loops when dealing with loops.

Ick.  This is now possibly an unbound walk from every use (even duplicate use!).
Micro-optimizing would be restricting the INTEGRAL_TYPE_P types to ones
matching pointer size.  Another micro-optimization would be to track/cache
whether a SSA def is based on a pointer, more optimizing to cache what
pointer(s!) it is based on.

There's testcases in bugzilla somewhere hard on compile-time in this code
and I can imagine a trivial degenerate one to trigger the issue.

Richard.

> Bootstrapped and tested on x86_64-linux-gnu.
>
> PR tree-optimization/111422
>
> gcc/ChangeLog:
>
> * cfgexpand.cc (add_scope_conflicts_2): Rewrite to be a full walk
> of all operands and their uses.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/cfgexpand.cc | 46 +++---
>  1 file changed, 27 insertions(+), 19 deletions(-)
>
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index 6c1096363af..2e653d7207c 100644
> --- a/gcc/cfgexpand.cc
> +++ b/gcc/cfgexpand.cc
> @@ -573,32 +573,40 @@ visit_conflict (gimple *, tree op, tree, void *data)
>
>  /* Helper function for add_scope_conflicts_1.  For USE on
> a stmt, if it is a SSA_NAME and in its SSA_NAME_DEF_STMT is known to be
> -   based on some ADDR_EXPR, invoke VISIT on that ADDR_EXPR.  */
> +   based on some ADDR_EXPR, invoke VISIT on that ADDR_EXPR. Also walk
> +   the assignments backwards as they might be based on an ADDR_EXPR.  */
>
> -static inline void
> +static void
>  add_scope_conflicts_2 (tree use, bitmap work,
>walk_stmt_load_store_addr_fn visit)
>  {
> -  if (TREE_CODE (use) == SSA_NAME
> -  && (POINTER_TYPE_P (TREE_TYPE (use))
> - || INTEGRAL_TYPE_P (TREE_TYPE (use
> +  auto_vec work_list;
> +  auto_bitmap visited_ssa_names;
> +  work_list.safe_push (use);
> +
> +  while (!work_list.is_empty())
>  {
> -  gimple *g = SSA_NAME_DEF_STMT (use);
> -  if (gassign *a = dyn_cast  (g))
> +  use = work_list.pop();
> +  if (!use)
> +   continue;
> +  if (TREE_CODE (use) == ADDR_EXPR)
> +   visit (nullptr, TREE_OPERAND (use, 0), use, work);
> +  else if (TREE_CODE (use) == SSA_NAME
> +  && (POINTER_TYPE_P (TREE_TYPE (use))
> +  || INTEGRAL_TYPE_P (TREE_TYPE (use
> {
> - if (tree op = gimple_assign_rhs1 (a))
> -   if (TREE_CODE (op) == ADDR_EXPR)
> - visit (a, TREE_OPERAND (op, 0), op, work);
> + gimple *g = SSA_NAME_DEF_STMT (use);
> + if (!bitmap_set_bit (visited_ssa_names, SSA_NAME_VERSION(use)))
> +   continue;
> + if (gassign *a = dyn_cast  (g))
> +   {
> + for (unsigned i = 1; i < gimple_num_ops (g); i++)
> +   work_list.safe_push (gimple_op (a, i));
> +   }
> + else if (gphi *p = dyn_cast  (g))
> +   for (unsigned i = 0; i < gimple_phi_num_args (p); ++i)
> + work_list.safe_push (gimple_phi_arg_def (p, i));
> }
> -  else if (gphi *p = dyn_cast  (g))
> -   for (unsigned i = 0; i < gimple_phi_num_args (p); ++i)
> - if (TREE_CODE (use = gimple_phi_arg_def (p, i)) == SSA_NAME)
> -   if (gassign *a = dyn_cast  (SSA_NAME_DEF_STMT (use)))
> - {
> -   if (tree op = gimple_assign_rhs1 (a))
> - if (TREE_CODE (op) == ADDR_EXPR)
> -   visit (a, TREE_OPERAND (op, 0), op, work);
> - }
>  }
>  }
>
> --
> 2.34.1
>

Re: [PATCH 1/3] aarch64: libgcc: Cleanup warnings in lse.S

2024-10-04 Thread Kyrylo Tkachov




> On 3 Oct 2024, at 21:44, Christophe Lyon  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Since
>  Commit c608ada288ced0268c1fd4136f56c34b24d4
>  Author: Zac Walker 
>  CommitDate: 2024-01-23 15:32:30 +
> 
>  Ifdef `.hidden`, `.type`, and `.size` pseudo-ops for `aarch64-w64-mingw32` 
> target
> 
> lse.S includes aarch64-asm.h, leading to a conflicting definition of macro 
> 'L':
> - in lse.S it expands to either '' or 'L'
> - in aarch64-asm.h it is used to generate .L ## label
> 
> lse.S does not use the second, so this patch just undefines L after
> the inclusion of aarch64-asm.h.


Ok.
Thanks,
Kyrill

> 
> libgcc/
>* config/aarch64/lse.S: Undefine L() macro.
> ---
> libgcc/config/aarch64/lse.S | 4 
> 1 file changed, 4 insertions(+)
> 
> diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S
> index ecef47086c6..77b3dc5a981 100644
> --- a/libgcc/config/aarch64/lse.S
> +++ b/libgcc/config/aarch64/lse.S
> @@ -54,6 +54,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
> #include "aarch64-asm.h"
> #include "auto-target.h"
> 
> +/* L is defined in aarch64-asm.h for a different purpose than why we
> +   use it here.  */
> +#undef L
> +
> /* Tell the assembler to accept LSE instructions.  */
> #ifdef HAVE_AS_LSE
>.arch armv8-a+lse
> --
> 2.34.1
>

[COMMITTED 1/2] testsuite: add missing braces around dejagnu directives

2024-10-04 Thread Sam James

gcc/testsuite/ChangeLog:

* c-c++-common/analyzer/flex-without-call-summaries.c: Add missing 
brace.
* c-c++-common/analyzer/malloc-callbacks.c: Ditto.
* gcc.dg/Wstringop-overflow-79.c: Ditto.
* gcc.dg/Wstringop-overflow-80.c: Ditto.
---
 .../analyzer/flex-without-call-summaries.c|  2 +-
 .../c-c++-common/analyzer/malloc-callbacks.c  |  2 +-
 gcc/testsuite/gcc.dg/Wstringop-overflow-79.c  | 28 +--
 gcc/testsuite/gcc.dg/Wstringop-overflow-80.c  | 28 +--
 4 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/analyzer/flex-without-call-summaries.c 
b/gcc/testsuite/c-c++-common/analyzer/flex-without-call-summaries.c
index 092d78486219..e68ac2f3b749 100644
--- a/gcc/testsuite/c-c++-common/analyzer/flex-without-call-summaries.c
+++ b/gcc/testsuite/c-c++-common/analyzer/flex-without-call-summaries.c
@@ -889,7 +889,7 @@ static int yy_get_next_buffer (void)
}
else
/* Can't grow it, we don't own it. */
-   b->yy_ch_buf = NULL;  /* { dg-bogus "leak" "PR 
analyzer/103546"  */
+   b->yy_ch_buf = NULL;  /* { dg-bogus "leak" "PR 
analyzer/103546" } */
 
if ( ! b->yy_ch_buf )
YY_FATAL_ERROR(
diff --git a/gcc/testsuite/c-c++-common/analyzer/malloc-callbacks.c 
b/gcc/testsuite/c-c++-common/analyzer/malloc-callbacks.c
index 0ba4f3824c62..422b40373634 100644
--- a/gcc/testsuite/c-c++-common/analyzer/malloc-callbacks.c
+++ b/gcc/testsuite/c-c++-common/analyzer/malloc-callbacks.c
@@ -64,7 +64,7 @@ void test_5 (void)
 {
   allocator_t alloc_fn = get_alloca ();
   deallocator_t dealloc_fn = get_free ();
-  int *ptr = (int *) alloc_fn (sizeof (int)); /* dg-message "region created on 
stack here" } */
+  int *ptr = (int *) alloc_fn (sizeof (int)); /* { dg-message "region created 
on stack here" } */
   dealloc_fn (ptr); /* { dg-warning "'free' of 'ptr' which points to memory on 
the stack" } */
 }
 
diff --git a/gcc/testsuite/gcc.dg/Wstringop-overflow-79.c 
b/gcc/testsuite/gcc.dg/Wstringop-overflow-79.c
index 15eb26fbdb73..e97cb91ba18d 100644
--- a/gcc/testsuite/gcc.dg/Wstringop-overflow-79.c
+++ b/gcc/testsuite/gcc.dg/Wstringop-overflow-79.c
@@ -5,8 +5,8 @@
{ dg-do compile }
{ dg-options "-O0 -Wno-array-bounds" } */
 
-extern char a[8]; // dg-message at offset \\\[3, 6] into 
destination object 'a'" "note 1" }
-  // dg-message at offset \\\[5, 8] into 
destination object 'a'" "note 2" { target *-*-* } .-1 }
+extern char a[8]; // { dg-message "at offset \\\[3, 6] into 
destination object 'a'" "note 1" }
+  // { dg-message "at offset \\\[5, 8] into 
destination object 'a'" "note 2" { target *-*-* } .-1 }
 
 void test_2_notes (int i)
 {
@@ -15,9 +15,9 @@ void test_2_notes (int i)
 }
 
 
-extern char b[8]; // dg-message at offset \\\[3, 6] into 
destination object 'b'" "note 1" }
-  // dg-message at offset \\\[4, 7] into 
destination object 'b'" "note 2" { target *-*-* } .-1 }
-  // dg-message at offset \\\[5, 8] into 
destination object 'b'" "note 3" { target *-*-* } .-2 }
+extern char b[8]; // { dg-message "at offset \\\[3, 6] into 
destination object 'b'" "note 1" }
+  // { dg-message "at offset \\\[4, 7] into 
destination object 'b'" "note 2" { target *-*-* } .-1 }
+  // { dg-message "at offset \\\[5, 8] into 
destination object 'b'" "note 3" { target *-*-* } .-2 }
 
 void test_3_notes (int i)
 {
@@ -26,10 +26,10 @@ void test_3_notes (int i)
 }
 
 
-extern char c[8]; // dg-message at offset \\\[3, 6] into 
destination object 'c'" "note 1" }
-  // dg-message at offset \\\[4, 7] into 
destination object 'c'" "note 2" { target *-*-* } .-1 }
-  // dg-message at offset \\\[5, 8] into 
destination object 'c'" "note 3" { target *-*-* } .-2 }
-  // dg-message at offset \\\[6, 8] into 
destination object 'c'" "note 3" { target *-*-* } .-2 }
+extern char c[8]; // { dg-message "at offset \\\[3, 6] into 
destination object 'c'" "note 1" }
+  // { dg-message "at offset \\\[4, 7] into 
destination object 'c'" "note 2" { target *-*-* } .-1 }
+  // { dg-message "at offset \\\[5, 8] into 
destination object 'c'" "note 3" { target *-*-* } .-2 }
+  // { dg-message "at offset \\\[6, 8] into 
destination object 'c'" "note 3" { target *-*-* } .-2 }
 
 void test_4_notes (int i)
 {
@@ -47,11 +47,11 @@ void test_4_notes (int i)
 }
 
 
-extern char d[8]; // dg-me

[COMMITTED 2/2] testsuite: fix two newly-running -Wstringop-overflow test directives

2024-10-04 Thread Sam James

This didn't show up until the previous commit which fixed the directive
syntax. The indexing was off for the notes.

gcc/testsuite/ChangeLog:

* gcc.dg/Wstringop-overflow-79.c: Fix index for notes.
* gcc.dg/Wstringop-overflow-80.c: Ditto.
---
 gcc/testsuite/gcc.dg/Wstringop-overflow-79.c | 6 +++---
 gcc/testsuite/gcc.dg/Wstringop-overflow-80.c | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/Wstringop-overflow-79.c 
b/gcc/testsuite/gcc.dg/Wstringop-overflow-79.c
index e97cb91ba18d..87bf775c0b2b 100644
--- a/gcc/testsuite/gcc.dg/Wstringop-overflow-79.c
+++ b/gcc/testsuite/gcc.dg/Wstringop-overflow-79.c
@@ -29,7 +29,7 @@ void test_3_notes (int i)
 extern char c[8]; // { dg-message "at offset \\\[3, 6] into 
destination object 'c'" "note 1" }
   // { dg-message "at offset \\\[4, 7] into 
destination object 'c'" "note 2" { target *-*-* } .-1 }
   // { dg-message "at offset \\\[5, 8] into 
destination object 'c'" "note 3" { target *-*-* } .-2 }
-  // { dg-message "at offset \\\[6, 8] into 
destination object 'c'" "note 3" { target *-*-* } .-2 }
+  // { dg-message "at offset \\\[6, 8] into 
destination object 'c'" "note 4" { target *-*-* } .-3 }
 
 void test_4_notes (int i)
 {
@@ -50,8 +50,8 @@ void test_4_notes (int i)
 extern char d[8]; // { dg-message "at offset \\\[3, 6] into 
destination object 'd'" "note 1" }
   // { dg-message "at offset \\\[4, 7] into 
destination object 'd'" "note 2" { target *-*-* } .-1 }
   // { dg-message "at offset \\\[5, 8] into 
destination object 'd'" "note 3" { target *-*-* } .-2 }
-  // { dg-message "at offset \\\[6, 8] into 
destination object 'd'" "note 3" { target *-*-* } .-3 }
-  // { dg-message "at offset \\\[7, 8] into 
destination object 'd'" "note 3" { target *-*-* } .-4 }
+  // { dg-message "at offset \\\[6, 8] into 
destination object 'd'" "note 4" { target *-*-* } .-3 }
+  // { dg-message "at offset \\\[7, 8] into 
destination object 'd'" "note 5" { target *-*-* } .-4 }
 
 void test_5_notes (int i)
 {
diff --git a/gcc/testsuite/gcc.dg/Wstringop-overflow-80.c 
b/gcc/testsuite/gcc.dg/Wstringop-overflow-80.c
index c74ca3a7918b..f49b5ffc636b 100644
--- a/gcc/testsuite/gcc.dg/Wstringop-overflow-80.c
+++ b/gcc/testsuite/gcc.dg/Wstringop-overflow-80.c
@@ -29,7 +29,7 @@ void test_3_notes (int i)
 extern char c[8]; // { dg-message "at offset \\\[3, 6] into 
destination object 'c'" "note 1" }
   // { dg-message "at offset \\\[4, 7] into 
destination object 'c'" "note 2" { target *-*-* } .-1 }
   // { dg-message "at offset \\\[5, 8] into 
destination object 'c'" "note 3" { target *-*-* } .-2 }
-  // { dg-message "at offset \\\[6, 8] into 
destination object 'c'" "note 3" { target *-*-* } .-2 }
+  // { dg-message "at offset \\\[6, 8] into 
destination object 'c'" "note 4" { target *-*-* } .-3 }
 
 void test_4_notes (int i)
 {
@@ -50,8 +50,8 @@ void test_4_notes (int i)
 extern char d[8]; // { dg-message "at offset \\\[3, 6] into 
destination object 'd'" "note 1" }
   // { dg-message "at offset \\\[4, 7] into 
destination object 'd'" "note 2" { target *-*-* } .-1 }
   // { dg-message "at offset \\\[5, 8] into 
destination object 'd'" "note 3" { target *-*-* } .-2 }
-  // { dg-message "at offset \\\[6, 8] into 
destination object 'd'" "note 3" { target *-*-* } .-3 }
-  // { dg-message "at offset \\\[7, 8] into 
destination object 'd'" "note 3" { target *-*-* } .-4 }
+  // { dg-message "at offset \\\[6, 8] into 
destination object 'd'" "note 4" { target *-*-* } .-3 }
+  // { dg-message "at offset \\\[7, 8] into 
destination object 'd'" "note 5" { target *-*-* } .-4 }
 
 void test_5_notes (int i)
 {
-- 
2.46.2

Re: [PATCH 3/3] aarch64: libgcc: Add -Werror support

2024-10-04 Thread Christophe Lyon

On Fri, 4 Oct 2024 at 10:50, Kyrylo Tkachov  wrote:
>
>
>
> > On 3 Oct 2024, at 21:44, Christophe Lyon  wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > When --enable-werror is enabled when running the top-level configure,
> > it passes --enable-werror-always to subdirs.  Some of them, like
> > libgcc, ignore it.
> >
> > This patch adds support for it, enabled only for aarch64, to avoid
> > breaking bootstrap for other targets.
> >
>
> The aarch64 part is ok but you’ll need a wider libgcc approval.
> It seems to me that if libgcc is intended to compile cleanly with -Werror 
> then it should be a libgcc-wide change, but maybe doing it port-by-port is 
> the only practical way of getting there?

Indeed, it was not clear to me if libgcc is supposed to compile
without warnings My feeling is that warnings or often worth having
a look, but without -Werror they get unnoticed.

Adding Ian in cc as libgcc maintainer.

Thanks,

Christophe

> Thanks,
> Kyrill
>
>
> > The patch also adds -Wno-prio-ctor-dtor to avoid a warning when compiling 
> > lse_init.c
> >
> >libgcc/
> >* Makefile.in (WERROR): New.
> >* config/aarch64/t-aarch64: Handle WERROR. Always use
> >-Wno-prio-ctor-dtor.
> >* configure.ac: Add support for --enable-werror-always.
> >* configure: Regenerate.
> > ---
> > libgcc/Makefile.in  |  1 +
> > libgcc/config/aarch64/t-aarch64 |  1 +
> > libgcc/configure| 31 +++
> > libgcc/configure.ac |  5 +
> > 4 files changed, 38 insertions(+)
> >
> > diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
> > index 0e46e9ef768..eca62546642 100644
> > --- a/libgcc/Makefile.in
> > +++ b/libgcc/Makefile.in
> > @@ -84,6 +84,7 @@ AR_FLAGS = rc
> >
> > CC = @CC@
> > CFLAGS = @CFLAGS@
> > +WERROR = @WERROR@
> > RANLIB = @RANLIB@
> > LN_S = @LN_S@
> >
> > diff --git a/libgcc/config/aarch64/t-aarch64 
> > b/libgcc/config/aarch64/t-aarch64
> > index b70e7b94edd..ae1588ce307 100644
> > --- a/libgcc/config/aarch64/t-aarch64
> > +++ b/libgcc/config/aarch64/t-aarch64
> > @@ -30,3 +30,4 @@ LIB2ADDEH += \
> >$(srcdir)/config/aarch64/__arm_za_disable.S
> >
> > SHLIB_MAPFILES += $(srcdir)/config/aarch64/libgcc-sme.ver
> > +LIBGCC2_CFLAGS += $(WERROR) -Wno-prio-ctor-dtor
> > diff --git a/libgcc/configure b/libgcc/configure
> > index cff1eff9625..ae56f7dbdc9 100755
> > --- a/libgcc/configure
> > +++ b/libgcc/configure
> > @@ -592,6 +592,7 @@ enable_execute_stack
> > asm_hidden_op
> > extra_parts
> > cpu_type
> > +WERROR
> > get_gcc_base_ver
> > HAVE_STRUB_SUPPORT
> > thread_header
> > @@ -719,6 +720,7 @@ enable_tm_clone_registry
> > with_glibc_version
> > enable_tls
> > with_gcc_major_version_only
> > +enable_werror_always
> > '
> >   ac_precious_vars='build_alias
> > host_alias
> > @@ -1361,6 +1363,7 @@ Optional Features:
> >   installations without PT_GNU_EH_FRAME support
> >   --disable-tm-clone-registrydisable TM clone registry
> >   --enable-tlsUse thread-local storage [default=yes]
> > +  --enable-werror-always  enable -Werror despite compiler version
> >
> > Optional Packages:
> >   --with-PACKAGE[=ARG]use PACKAGE [ARG=yes]
> > @@ -5808,6 +5811,34 @@ fi
> >
> >
> >
> > +# Only enable with --enable-werror-always until existing warnings are
> > +# corrected.
> > +ac_ext=c
> > +ac_cpp='$CPP $CPPFLAGS'
> > +ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
> > +ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS 
> > conftest.$ac_ext $LIBS >&5'
> > +ac_compiler_gnu=$ac_cv_c_compiler_gnu
> > +
> > +WERROR=
> > +# Check whether --enable-werror-always was given.
> > +if test "${enable_werror_always+set}" = set; then :
> > +  enableval=$enable_werror_always;
> > +else
> > +  enable_werror_always=no
> > +fi
> > +
> > +if test $enable_werror_always = yes; then :
> > +  WERROR="$WERROR${WERROR:+ }-Werror"
> > +fi
> > +
> > +ac_ext=c
> > +ac_cpp='$CPP $CPPFLAGS'
> > +ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
> > +ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS 
> > conftest.$ac_ext $LIBS >&5'
> > +ac_compiler_gnu=$ac_cv_c_compiler_gnu
> > +
> > +
> > +
> > # Substitute configuration variables
> >
> >
> > diff --git a/libgcc/configure.ac b/libgcc/configure.ac
> > index 4e8c036990f..6b3ea2aea5c 100644
> > --- a/libgcc/configure.ac
> > +++ b/libgcc/configure.ac
> > @@ -13,6 +13,7 @@ sinclude(../config/unwind_ipinfo.m4)
> > sinclude(../config/gthr.m4)
> > sinclude(../config/sjlj.m4)
> > sinclude(../config/cet.m4)
> > +sinclude(../config/warnings.m4)
> >
> > AC_INIT([GNU C Runtime Library], 1.0,,[libgcc])
> > AC_CONFIG_SRCDIR([static-object.mk])
> > @@ -746,6 +747,10 @@ AC_SUBST(HAVE_STRUB_SUPPORT)
> > # Determine what GCC version number to use in filesystem paths.
> > GCC_BASE_VER
> >
> > +# Only enable with --enable-werror-always until existing warnings are
> > +# corrected.
> > +ACX_PRO

Re: [PATCH] RISC-V: Define LOGICAL_OP_NON_SHORT_CIRCUIT to 1 [PR116615]

2024-10-04 Thread Jeff Law





On 10/4/24 12:42 AM, Richard Biener wrote:

On Thu, Oct 3, 2024 at 3:15 AM Andrew Waterman  wrote:


On Wed, Oct 2, 2024 at 4:41 PM Jeff Law  wrote:




On 10/2/24 4:39 PM, Andrew Waterman wrote:

On Wed, Oct 2, 2024 at 5:56 AM Jeff Law  wrote:




On 9/5/24 12:52 PM, Palmer Dabbelt wrote:

We have cheap logical ops, so let's just move this back to the default
to take advantage of the standard branch/op hueristics.

gcc/ChangeLog:

PR target/116615
* config/riscv/riscv.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove.

So on the BPI  this is a pretty clear win.  Not surprisingly perlbench
and gcc are the big winners.  It somewhat surprisingly regresses x264,
deepsjeng & leela, but the magnitudes are smaller.  The net from a cycle
perspective is 2.4%.  Every benchmark looks better from a branch count
perspective.

So in my mind it's just a matter of fixing any testsuite fallout (I
would expect some) and this is OK.


Jeff, were you able to measure the change in static code size, too?
These results are very encouraging, but I'd like to make sure we don't
need to retain the current behavior when optimizing for size.

Codesize is ever so slightly worse.  As in less than .1%.  Not worth it
in my mind to do something different in that range.


It probably helps code-size when not optimizing for size depending on
how you align jumps.
By default we aren't aligning jumps at all.  The infrastructure is in 
place to allow uarchs to select their preferences though (we're using 
that infrastructure internally).


jeff

Re: [PATCH] testsuite: Fix fallout of turning warnings into errors on 32-bit Arm

2024-10-04 Thread Christophe Lyon

Hi Thiago,


On Fri, 1 Mar 2024 at 15:29, Richard Earnshaw (lists)
 wrote:
>
> On 01/03/2024 14:23, Andre Vieira (lists) wrote:
> > Hi Thiago,
> >
> > Thanks for this, LGTM but I can't approve this, CC'ing Richard.
> >
> > Do have a nitpick, in the gcc/testsuite/ChangeLog: remove 'gcc/testsuite' 
> > from bullet points 2-4.
> >
>
> Yes, this is OK with the change Andre mentioned (your push will fail if you 
> don't fix that).
>
> R.
>
> PS, if you've set up GCC git customizations (see 
> contrib/gcc-git-customization.sh), you can verify things like this with 'git 
> gcc-verify HEAD^..HEAD'
>

ISTM you have forgotten to commit this patch.
If you don't have commit rights, I can do it for you.

Thanks,

Christophe

>
> > Kind regards,
> > Andre
> >
> > On 13/01/2024 00:55, Thiago Jung Bauermann wrote:
> >> Since commits 2c3db94d9fd ("c: Turn int-conversion warnings into
> >> permerrors") and 55e94561e97e ("c: Turn -Wimplicit-function-declaration
> >> into a permerror") these tests fail with errors such as:
> >>
> >>FAIL: gcc.target/arm/pr59858.c (test for excess errors)
> >>FAIL: gcc.target/arm/pr65647.c (test for excess errors)
> >>FAIL: gcc.target/arm/pr65710.c (test for excess errors)
> >>FAIL: gcc.target/arm/pr97969.c (test for excess errors)
> >>
> >> Here's one example of the excess errors:
> >>
> >>FAIL: gcc.target/arm/pr65647.c (test for excess errors)
> >>Excess errors:
> >>/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:6:17: error: 
> >> initialization of 'int' from 'int *' makes integer from pointer without a 
> >> cast [-Wint-conversion]
> >>/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:6:51: error: 
> >> initialization of 'int' from 'int *' makes integer from pointer without a 
> >> cast [-Wint-conversion]
> >>/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:6:62: error: 
> >> initialization of 'int' from 'int *' makes integer from pointer without a 
> >> cast [-Wint-conversion]
> >>/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:7:48: error: 
> >> initialization of 'int' from 'int *' makes integer from pointer without a 
> >> cast [-Wint-conversion]
> >>/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:8:9: error: 
> >> initialization of 'int' from 'int *' makes integer from pointer without a 
> >> cast [-Wint-conversion]
> >>/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:24:5: error: 
> >> initialization of 'int' from 'int *' makes integer from pointer without a 
> >> cast [-Wint-conversion]
> >>/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:25:5: error: 
> >> initialization of 'int' from 'struct S1 *' makes integer from pointer 
> >> without a cast [-Wint-conversion]
> >>/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:41:3: error: 
> >> implicit declaration of function 'fn3'; did you mean 'fn2'? 
> >> [-Wimplicit-function-declaration]
> >>/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:46:3: error: 
> >> implicit declaration of function 'fn5'; did you mean 'fn4'? 
> >> [-Wimplicit-function-declaration]
> >>/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:57:16: error: 
> >> implicit declaration of function 'fn6'; did you mean 'fn4'? 
> >> [-Wimplicit-function-declaration]
> >>
> >> PR rtl-optimization/59858 and PR target/65710 test the fix of an ICE.
> >> PR target/65647 and PR target/97969 test for a compilation infinite loop.
> >>
> >> Therefore, add -fpermissive so that the tests behave as they did 
> >> previously.
> >> Tested on armv8l-linux-gnueabihf.
> >>
> >> gcc/testsuite/ChangeLog:
> >> * gcc.target/arm/pr59858.c: Add -fpermissive.
> >> * gcc/testsuite/gcc.target/arm/pr65647.c: Likewise.
> >> * gcc/testsuite/gcc.target/arm/pr65710.c: Likewise.
> >> * gcc/testsuite/gcc.target/arm/pr97969.c: Likewise.
> >> ---
> >>   gcc/testsuite/gcc.target/arm/pr59858.c | 2 +-
> >>   gcc/testsuite/gcc.target/arm/pr65647.c | 2 +-
> >>   gcc/testsuite/gcc.target/arm/pr65710.c | 2 +-
> >>   gcc/testsuite/gcc.target/arm/pr97969.c | 2 +-
> >>   4 files changed, 4 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/gcc/testsuite/gcc.target/arm/pr59858.c 
> >> b/gcc/testsuite/gcc.target/arm/pr59858.c
> >> index 3360b48e8586..9336edfce277 100644
> >> --- a/gcc/testsuite/gcc.target/arm/pr59858.c
> >> +++ b/gcc/testsuite/gcc.target/arm/pr59858.c
> >> @@ -1,5 +1,5 @@
> >>   /* { dg-do compile } */
> >> -/* { dg-options "-march=armv5te -fno-builtin -mfloat-abi=soft -mthumb 
> >> -fno-stack-protector -Os -fno-tree-loop-optimize -fno-tree-dominator-opts 
> >> -fPIC -w" } */
> >> +/* { dg-options "-march=armv5te -fno-builtin -mfloat-abi=soft -mthumb 
> >> -fno-stack-protector -Os -fno-tree-loop-optimize -fno-tree-dominator-opts 
> >> -fPIC -w -fpermissive" } */
> >>   /* { dg-require-effective-target fpic } */
> >>   /* { dg-skip-if "Incompatible command line options: -mfloat-abi=soft 
> >> -mfloat-abi=hard" { *-*-* } { "-mfloat-abi=hard" } { "" } } */
> >>   /* { dg-require-effective-

[PATCH] [PR116831] match.pd: Check trunc_mod vector obtap before folding.

2024-10-04 Thread Jennifer Schmitz

As in https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663185.html,
this patch guards the simplification x / y * y == x -> x % y == 0 in
match.pd for vector types by a check for:
1) Support of the mod optab for vectors OR
2) Application before vector lowering for non-VL vectors.

The patch was bootstrapped and tested with no regression on
aarch64-linux-gnu and x86_64-linux-gnu.
OK for mainline?

Signed-off-by: Jennifer Schmitz 

gcc/
PR tree-optimization/116831
* match.pd: Guard simplification to trunc_mod with check for
mod optab support.

gcc/testsuite/
PR tree-optimization/116831
* gcc.dg/torture/pr116831.c: New test.


0001-PR116831-match.pd-Check-trunc_mod-vector-obtap-befor.patch
Description: Binary data


smime.p7s
Description: S/MIME cryptographic signature

Re: [PATCH] aarch64: Expand CTZ to RBIT + CLZ for SVE [PR109498]

2024-10-04 Thread Soumya AR



> On 1 Oct 2024, at 6:17 PM, Richard Sandiford  
> wrote:
>
> External email: Use caution opening links or attachments
>
>
> Soumya AR  writes:
>> Currently, we vectorize CTZ for SVE by using the following operation:
>> .CTZ (X) = (PREC - 1) - .CLZ (X & -X)
>>
>> Instead, this patch expands CTZ to RBIT + CLZ for SVE, as suggested in 
>> PR109498.
>>
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
>> OK for mainline?
>>
>> Signed-off-by: Soumya AR 
>>
>> gcc/ChangeLog:
>>  PR target/109498
>>  * config/aarch64/aarch64-sve.md (ctz2): Added pattern to expand
>>CTZ to RBIT + CLZ for SVE.
>>
>> gcc/testsuite/ChangeLog:
>>  PR target/109498
>>  * gcc.target/aarch64/sve/ctz.c: New test.
>
> Generally looks good, but a couple of comments:
>
>> ---
>> gcc/config/aarch64/aarch64-sve.md  | 16 +++
>> gcc/testsuite/gcc.target/aarch64/sve/ctz.c | 49 ++
>> 2 files changed, 65 insertions(+)
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/ctz.c
>>
>> diff --git a/gcc/config/aarch64/aarch64-sve.md 
>> b/gcc/config/aarch64/aarch64-sve.md
>> index bfa28849adf..10094f156b3 100644
>> --- a/gcc/config/aarch64/aarch64-sve.md
>> +++ b/gcc/config/aarch64/aarch64-sve.md
>> @@ -3088,6 +3088,22 @@
>> ;; - NOT
>> ;; -
>>
>> +(define_expand "ctz2"
>> +  [(set (match_operand:SVE_I 0 "register_operand")
>> + (unspec:SVE_I
>> +   [(match_dup 2)
>> +(ctz:SVE_I
>> +  (match_operand:SVE_I 1 "register_operand"))]
>> +   UNSPEC_PRED_X))]
>> +  "TARGET_SVE"
>> +  {
>> + operands[2] = aarch64_ptrue_reg (mode);
>
> There's no real need to use operands[...] here.  It can just be
> a local variable.
>
>> + emit_insn (gen_aarch64_pred_rbit (operands[0], 
>> operands[2],operands[1]));
>> + emit_insn (gen_aarch64_pred_clz (operands[0], operands[2], 
>> operands[0]));
>
> Formatting nit: C++ lines should be 80 characters or fewer.
>
> More importantly, I think we should use a fresh register for the
> temporary (RBIT) result, since that tends to permit more optimisation.

Thanks for the feedback! Attaching an updated patch with the suggested changes.

Regards,
Soumya

> Thanks,
> Richard
>
>
>
>> + DONE;
>> +  }
>> +)
>> +
>> ;; Unpredicated integer unary arithmetic.
>> (define_expand "2"
>>   [(set (match_operand:SVE_I 0 "register_operand")
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/ctz.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve/ctz.c
>> new file mode 100644
>> index 000..433a9174f48
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/ctz.c
>> @@ -0,0 +1,49 @@
>> +/* { dg-final { check-function-bodies "**" "" } } */
>> +/* { dg-options "-O3 --param aarch64-autovec-preference=sve-only" } */
>> +
>> +#include 
>> +
>> +#define FUNC(FUNCTION, NAME, DTYPE) \
>> +void \
>> +NAME (DTYPE *__restrict x, DTYPE *__restrict y, int n) { \
>> +  for (int i = 0; i < n; i++)\
>> +x[i] = FUNCTION (y[i]); \
>> +}\
>> +
>> +
>> +/*
>> +** ctz_uint8:
>> +**   ...
>> +**   rbitz[0-9]+\.b, p[0-7]/m, z[0-9]+\.b
>> +**   clz z[0-9]+\.b, p[0-7]/m, z[0-9]+\.b
>> +**   ...
>> +*/
>> +FUNC (__builtin_ctzg, ctz_uint8, uint8_t)
>> +
>> +/*
>> +** ctz_uint16:
>> +**   ...
>> +**   rbitz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h
>> +**   clz z[0-9]+\.h, p[0-7]/m, z[0-9]+\.h
>> +**   ...
>> +*/
>> +FUNC (__builtin_ctzg, ctz_uint16, uint16_t)
>> +
>> +/*
>> +** ctz_uint32:
>> +**   ...
>> +**   rbitz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s
>> +**   clz z[0-9]+\.s, p[0-7]/m, z[0-9]+\.s
>> +**   ...
>> +*/
>> +FUNC (__builtin_ctz, ctz_uint32, uint32_t)
>> +
>> +/*
>> +** ctz_uint64:
>> +**   ...
>> +**   rbitz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d
>> +**   clz z[0-9]+\.d, p[0-7]/m, z[0-9]+\.d
>> +**   ...
>> +*/
>> +FUNC (__builtin_ctzll, ctz_uint64, uint64_t)
>> +
>> --
>> 2.43.2






0001-aarch64-Expand-CTZ-to-RBIT-CLZ-for-SVE-PR109498.patch
Description: 0001-aarch64-Expand-CTZ-to-RBIT-CLZ-for-SVE-PR109498.patch

[PATCH] tree-optimization/99856 - fix testcase

2024-10-04 Thread Richard Biener

When making the testcase use aligned accesses I botched up the
copy&paste.  Fixed.

Pushed.

PR tree-optimization/99856
* gcc.dg/vect/pr99856.c: Fix copy&paste errors.
---
 gcc/testsuite/gcc.dg/vect/pr99856.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr99856.c 
b/gcc/testsuite/gcc.dg/vect/pr99856.c
index e5d2a45be57..1ff20c7bc56 100644
--- a/gcc/testsuite/gcc.dg/vect/pr99856.c
+++ b/gcc/testsuite/gcc.dg/vect/pr99856.c
@@ -17,8 +17,8 @@ opSourceOver_premul(uint8_t* restrict Rrgba,
 const uint8_t* restrict Drgba, int len)
 {
   Rrgba = __builtin_assume_aligned (Rrgba, __BIGGEST_ALIGNMENT__);
-  Srgba = __builtin_assume_aligned (Rrgba, __BIGGEST_ALIGNMENT__);
-  Drgba = __builtin_assume_aligned (Rrgba, __BIGGEST_ALIGNMENT__);
+  Srgba = __builtin_assume_aligned (Srgba, __BIGGEST_ALIGNMENT__);
+  Drgba = __builtin_assume_aligned (Drgba, __BIGGEST_ALIGNMENT__);
   int i = 0;
   for (; i < len*4; i += 4)
 {
-- 
2.43.0

[committed] libstdc++: Replace implicit lambda capture of 'this' [PR116964]

2024-10-04 Thread Jonathan Wakely

Tested x86_64-linux (and tested the affected code by manually bodging
the _GLIBCXX_USE_PTHREAD_RWLOCK_T macro).

Pushed to trunk.

-- >8 --

C++20 deprecates implicit capture of 'this', so change [=] to [this] for
all lambda expressions in . This only shows up on targets
where _GLIBCXX_USE_PTHREAD_RWLOCK_T is not defined, as we have an
alternative implementation of shared mutexes in that case.

libstdc++-v3/ChangeLog:

PR libstdc++/116964
* include/std/shared_mutex (__shared_mutex_cv): Use [this] for
lambda captures.
(shared_timed_mutex) [!_GLIBCXX_USE_PTHREAD_RWLOCK_T]: Likewise.
---
 libstdc++-v3/include/std/shared_mutex | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/std/shared_mutex 
b/libstdc++-v3/include/std/shared_mutex
index 9bf98c0b040..b369a15cc60 100644
--- a/libstdc++-v3/include/std/shared_mutex
+++ b/libstdc++-v3/include/std/shared_mutex
@@ -332,10 +332,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
   unique_lock __lk(_M_mut);
   // Wait until we can set the write-entered flag.
-  _M_gate1.wait(__lk, [=]{ return !_M_write_entered(); });
+  _M_gate1.wait(__lk, [this]{ return !_M_write_entered(); });
   _M_state |= _S_write_entered;
   // Then wait until there are no more readers.
-  _M_gate2.wait(__lk, [=]{ return _M_readers() == 0; });
+  _M_gate2.wait(__lk, [this]{ return _M_readers() == 0; });
 }
 
 bool
@@ -367,7 +367,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 lock_shared()
 {
   unique_lock __lk(_M_mut);
-  _M_gate1.wait(__lk, [=]{ return _M_state < _S_max_readers; });
+  _M_gate1.wait(__lk, [this]{ return _M_state < _S_max_readers; });
   ++_M_state;
 }
 
@@ -690,13 +690,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
unique_lock __lk(_M_mut);
if (!_M_gate1.wait_until(__lk, __abs_time,
-[=]{ return !_M_write_entered(); }))
+[this]{ return !_M_write_entered(); }))
  {
return false;
  }
_M_state |= _S_write_entered;
if (!_M_gate2.wait_until(__lk, __abs_time,
-[=]{ return _M_readers() == 0; }))
+[this]{ return _M_readers() == 0; }))
  {
_M_state ^= _S_write_entered;
// Wake all threads blocked while the write-entered flag was set.
@@ -716,7 +716,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
unique_lock __lk(_M_mut);
if (!_M_gate1.wait_until(__lk, __abs_time,
-[=]{ return _M_state < _S_max_readers; }))
+[this]{ return _M_state < _S_max_readers; }))
  {
return false;
  }
-- 
2.46.1

[PATCH] Relax gcc.dg/vect/pr65947-8.c

2024-10-04 Thread Richard Biener

When failing using forced SLP we do not print the non-SLP failure
mode which reads slightly different.  Massage the expectation a bit.

Pushed.

* gcc.dg/vect/pr65947-8.c: Adjust.
---
 gcc/testsuite/gcc.dg/vect/pr65947-8.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-8.c 
b/gcc/testsuite/gcc.dg/vect/pr65947-8.c
index 9ced4dbb69f..827575778f8 100644
--- a/gcc/testsuite/gcc.dg/vect/pr65947-8.c
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-8.c
@@ -43,4 +43,4 @@ main (void)
 
 /* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! { 
vect_fold_extract_last } } } } } */
 /* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { 
vect_fold_extract_last } } } } */
-/* { dg-final { scan-tree-dump "multiple types in double reduction or 
condition reduction" "vect" { target { ! { vect_fold_extract_last } } } } } */
+/* { dg-final { scan-tree-dump "multiple types in\[^\\n\\r\]* condition 
reduction" "vect" { target { ! { vect_fold_extract_last } } } } } */
-- 
2.43.0

[PATCH] Add single-lane SLP support to .GOMP_SIMD_LANE vectorization

2024-10-04 Thread Richard Biener

The following adds basic support for single-lane SLP .GOMP_SIMD_LANE
vectorization, in particular it enables SLP discovery.

* tree-vect-slp.cc (no_arg_map): New.
(vect_get_operand_map): Handle IFN_GOMP_SIMD_LANE.
(vect_build_slp_tree_1): Likewise.
* tree-vect-stmts.cc (vectorizable_call): Handle single-lane SLP
for .GOMP_SIMD_LANE calls.
---
 gcc/tree-vect-slp.cc   | 11 +++
 gcc/tree-vect-stmts.cc | 27 +++
 2 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 2274d0e428e..125e69cf0eb 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -507,6 +507,7 @@ static const int cond_expr_maps[3][5] = {
   { 4, -2, -1, 1, 2 },
   { 4, -1, -2, 2, 1 }
 };
+static const int no_arg_map[] = { 0 };
 static const int arg0_map[] = { 1, 0 };
 static const int arg1_map[] = { 1, 1 };
 static const int arg2_map[] = { 1, 2 };
@@ -587,6 +588,9 @@ vect_get_operand_map (const gimple *stmt, bool 
gather_scatter_p = false,
  case IFN_CTZ:
return arg0_map;
 
+ case IFN_GOMP_SIMD_LANE:
+   return no_arg_map;
+
  default:
break;
  }
@@ -1175,6 +1179,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
  ldst_p = true;
  rhs_code = CFN_MASK_STORE;
}
+ else if (cfn == CFN_GOMP_SIMD_LANE)
+   ;
  else if ((cfn != CFN_LAST
&& cfn != CFN_MASK_CALL
&& internal_fn_p (cfn)
@@ -1273,6 +1279,11 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
  need_same_oprnds = true;
  first_op1 = gimple_call_arg (call_stmt, 1);
}
+ else if (rhs_code == CFN_GOMP_SIMD_LANE)
+   {
+ need_same_oprnds = true;
+ first_op1 = gimple_call_arg (call_stmt, 1);
+   }
}
   else
{
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 584be52f423..b5dd03b25a4 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -3392,7 +3392,7 @@ vectorizable_call (vec_info *vinfo,
   if (ifn == IFN_LAST && !fndecl)
 {
   if (cfn == CFN_GOMP_SIMD_LANE
- && !slp_node
+ && (!slp_node || SLP_TREE_LANES (slp_node) == 1)
  && loop_vinfo
  && LOOP_VINFO_LOOP (loop_vinfo)->simduid
  && TREE_CODE (gimple_call_arg (stmt, 0)) == SSA_NAME
@@ -3538,18 +3538,15 @@ vectorizable_call (vec_info *vinfo,
  /* Build argument list for the vectorized call.  */
  if (slp_node)
{
- vec vec_oprnds0;
-
+ unsigned int vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
  vect_get_slp_defs (vinfo, slp_node, &vec_defs);
- vec_oprnds0 = vec_defs[0];
 
  /* Arguments are ready.  Create the new vector stmt.  */
- FOR_EACH_VEC_ELT (vec_oprnds0, i, vec_oprnd0)
+ for (i = 0; i < vec_num; ++i)
{
  int varg = 0;
  if (masked_loop_p && reduc_idx >= 0)
{
- unsigned int vec_num = vec_oprnds0.length ();
  /* Always true for SLP.  */
  gcc_assert (ncopies == 1);
  vargs[varg++] = vect_get_loop_mask (loop_vinfo,
@@ -3590,11 +3587,26 @@ vectorizable_call (vec_info *vinfo,
  vect_finish_stmt_generation (vinfo, stmt_info,
   new_stmt, gsi);
}
+ else if (cfn == CFN_GOMP_SIMD_LANE)
+   {
+ /* ???  For multi-lane SLP we'd need to build
+{ 0, 0, .., 1, 1, ... }.  */
+ tree cst = build_index_vector (vectype_out,
+i * nunits_out, 1);
+ tree new_var
+   = vect_get_new_ssa_name (vectype_out, vect_simple_var,
+"cst_");
+ gimple *init_stmt = gimple_build_assign (new_var, cst);
+ vect_init_vector_1 (vinfo, stmt_info, init_stmt, NULL);
+ new_temp = make_ssa_name (vec_dest);
+ new_stmt = gimple_build_assign (new_temp, new_var);
+ vect_finish_stmt_generation (vinfo, stmt_info, new_stmt,
+  gsi);
+   }
  else
{
  if (len_opno >= 0 && len_loop_p)
{
- unsigned int vec_num = vec_oprnds0.length ();
  /* Always true for SLP.  */
  gcc_assert (ncopies == 1);
  tree len
@@ -3608,7 +3620,6 @@ vectorizable_call (vec_info *vinfo,

Re: [PATCH] libstdc++: Unroll loop in load_bytes function

2024-10-04 Thread Jonathan Wakely

On Fri, 4 Oct 2024 at 13:53, Dmitry Ilvokhin  wrote:
>
> On Fri, Oct 04, 2024 at 10:20:27AM +0100, Jonathan Wakely wrote:
> > On Fri, 4 Oct 2024 at 10:19, Jonathan Wakely  wrote:
> > >
> > > On Fri, 4 Oct 2024 at 07:53, Richard Biener  
> > > wrote:
> > > >
> > > > On Wed, Oct 2, 2024 at 8:26 PM Jonathan Wakely  
> > > > wrote:
> > > > >
> > > > > On Wed, 2 Oct 2024 at 19:16, Jonathan Wakely  
> > > > > wrote:
> > > > > >
> > > > > > On Wed, 2 Oct 2024 at 19:15, Dmitry Ilvokhin  
> > > > > > wrote:
> > > > > > >
> > > > > > > Instead of looping over every byte of the tail, unroll loop 
> > > > > > > manually
> > > > > > > using switch statement, then compilers (at least GCC and Clang) 
> > > > > > > will
> > > > > > > generate a jump table [1], which is faster on a microbenchmark 
> > > > > > > [2].
> > > > > > >
> > > > > > > [1]: https://godbolt.org/z/aE8Mq3j5G
> > > > > > > [2]: https://quick-bench.com/q/ylYLW2R22AZKRvameYYtbYxag24
> > > > > > >
> > > > > > > libstdc++-v3/ChangeLog:
> > > > > > >
> > > > > > > * libstdc++-v3/libsupc++/hash_bytes.cc (load_bytes): 
> > > > > > > unroll
> > > > > > >   loop using switch statement.
> > > > > > >
> > > > > > > Signed-off-by: Dmitry Ilvokhin 
> > > > > > > ---
> > > > > > >  libstdc++-v3/libsupc++/hash_bytes.cc | 27 
> > > > > > > +++
> > > > > > >  1 file changed, 23 insertions(+), 4 deletions(-)
> > > > > > >
> > > > > > > diff --git a/libstdc++-v3/libsupc++/hash_bytes.cc 
> > > > > > > b/libstdc++-v3/libsupc++/hash_bytes.cc
> > > > > > > index 3665375096a..294a7323dd0 100644
> > > > > > > --- a/libstdc++-v3/libsupc++/hash_bytes.cc
> > > > > > > +++ b/libstdc++-v3/libsupc++/hash_bytes.cc
> > > > > > > @@ -50,10 +50,29 @@ namespace
> > > > > > >load_bytes(const char* p, int n)
> > > > > > >{
> > > > > > >  std::size_t result = 0;
> > > > > > > ---n;
> > > > > > > -do
> > > > > > > -  result = (result << 8) + static_cast(p[n]);
> > > > > > > -while (--n >= 0);
> > > > > >
> > > > > > Don't we still need to loop, for the case where n >= 8? Otherwise we
> > > > > > only hash the first 8 bytes.
> > > > >
> > > > > Ah, but it's only ever called with load_bytes(end, len & 0x7)
> > > >
> > > > The compiler should do such transforms - you probably want to tell
> > > > it that n < 8 though, it likely doesn't (always) know.
> > >
> > > e.g. like this?
> > >
> > > if ((n & 7) != n)
> > >   __builtin_unreachable();
> > >
> > > For the microbenchmark that seems to make things consistently worse:
> > > https://quick-bench.com/q/2yCEqzFS8R8ueJ0-Gs-sZ6uWWEw
> >
> > Oh actually in the benchmark I used (!(1 <= n && n < 8)) because 1 <=
> > n is always true too.
> >
>
> GCC still wasn't able to unroll the loop, even with a
> __builtin_unreachable, but benchmark link you mentioned above uses -O2
> optimization level (not sure if it was intentional).

That was intentional, because that's how libsupc++/hash_bytes.cc gets compiled.

>
> If we'll use -O3 [1], then GCC was able to unroll the loop for
> load_bytes_loop_assume version, but at the same time I am not sure all
> loop control instructions were elided, I still can see them on Godbolt
> version of generated code [2]. Benchmark charts partially confirm that,
> because performance of load_bytes_loop and load_bytes_loop_assume are
> now quite close (same actually, except case n = 1). I guess it would
> make sense, as we execute same amount of instructions.
>
> In addition, chart for load_bytes_switch look quite jumpy for [1] and
> became better for cases n = 1 and n = 2. At this point I am not sure it
> is not a code alignment issue and we are not measuring noise.
>
> [1]: https://quick-bench.com/q/LlcgMVhL61CasZVjCWbHd3uid8w
> [2]: https://godbolt.org/z/qPf1n7xWs
>

[committed] libstdc++: Fix some Parallel Mode testsuite failures

2024-10-04 Thread Jonathan Wakely

There are more failures that I haven't found yet, because running make
check-parallel seems to take several days (because I'm running with
GLIBCXX_TESTSUITE_STDS=98,11,14,17,20,23,26).

We can fix the rest later.  Pushed to trunk.

-- >8 --

Some of these are due to no longer using #pragma GCC system_header in
libstdc++ headers, some have been failing for longer and weren't
noticed.

libstdc++-v3/ChangeLog:

* include/parallel/algobase.h (search): Use sequential algorithm
for constant evaluation.
* include/parallel/algorithmfwd.h (search): Add
_GLIBCXX20_CONSTEXPR.
* include/parallel/multiway_merge.h: Remove stray semi-colon.
* include/parallel/multiseq_selection.h: Add diagnostic pragmas
for -Wlong-long warning.
* include/parallel/quicksort.h: Likewise.
* include/parallel/random_number.h: Likewise.
* include/parallel/settings.h: Likewise.
* include/parallel/workstealing.h: Replace ++ and -- on volatile
variables.
* testsuite/17_intro/names.cc: Skip names defined by
.
* testsuite/20_util/pair/dangling_ref.cc: Skip test if Parallel
Mode is enabled.
* testsuite/20_util/tuple/dangling_ref.cc: Likewise.
---
 libstdc++-v3/include/parallel/algobase.h | 6 ++
 libstdc++-v3/include/parallel/algorithmfwd.h | 1 +
 libstdc++-v3/include/parallel/multiseq_selection.h   | 3 +++
 libstdc++-v3/include/parallel/multiway_merge.h   | 2 +-
 libstdc++-v3/include/parallel/quicksort.h| 3 +++
 libstdc++-v3/include/parallel/random_number.h| 5 +
 libstdc++-v3/include/parallel/settings.h | 5 +
 libstdc++-v3/include/parallel/workstealing.h | 4 ++--
 libstdc++-v3/testsuite/17_intro/names.cc | 6 --
 libstdc++-v3/testsuite/20_util/pair/dangling_ref.cc  | 2 +-
 libstdc++-v3/testsuite/20_util/tuple/dangling_ref.cc | 2 +-
 11 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/libstdc++-v3/include/parallel/algobase.h 
b/libstdc++-v3/include/parallel/algobase.h
index 67362f4ecaa..b46ed610661 100644
--- a/libstdc++-v3/include/parallel/algobase.h
+++ b/libstdc++-v3/include/parallel/algobase.h
@@ -515,11 +515,17 @@ namespace __parallel
   // Public interface
   template
+_GLIBCXX20_CONSTEXPR
 inline _FIterator1
 search(_FIterator1 __begin1, _FIterator1 __end1,
   _FIterator2 __begin2, _FIterator2 __end2,
   _BinaryPredicate  __pred)
 {
+#if __cplusplus > 201703L
+  if (std::is_constant_evaluated())
+   return _GLIBCXX_STD_A::search(__begin1, __end1, __begin2, __end2,
+ std::move(__pred));
+#endif
   return __search_switch(__begin1, __end1, __begin2, __end2, __pred,
 std::__iterator_category(__begin1),
 std::__iterator_category(__begin2));
diff --git a/libstdc++-v3/include/parallel/algorithmfwd.h 
b/libstdc++-v3/include/parallel/algorithmfwd.h
index 476072b860a..7c9843ab161 100644
--- a/libstdc++-v3/include/parallel/algorithmfwd.h
+++ b/libstdc++-v3/include/parallel/algorithmfwd.h
@@ -353,6 +353,7 @@ namespace __parallel
__gnu_parallel::sequential_tag);
 
   template
+_GLIBCXX20_CONSTEXPR
 _FIter1
 search(_FIter1, _FIter1, _FIter2, _FIter2, _BiPredicate);
 
diff --git a/libstdc++-v3/include/parallel/multiseq_selection.h 
b/libstdc++-v3/include/parallel/multiseq_selection.h
index 22bd97e6432..53264fd156b 100644
--- a/libstdc++-v3/include/parallel/multiseq_selection.h
+++ b/libstdc++-v3/include/parallel/multiseq_selection.h
@@ -189,9 +189,12 @@ namespace __gnu_parallel
 
   __r = __rd_log2(__nmax) + 1;
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wlong-long" // LL literal
   // Pad all lists to this length, at least as long as any ns[__i],
   // equality iff __nmax = 2^__k - 1.
   __l = (1ULL << __r) - 1;
+#pragma GCC diagnostic pop
 
   for (_SeqNumber __i = 0; __i < __m; __i++)
 {
diff --git a/libstdc++-v3/include/parallel/multiway_merge.h 
b/libstdc++-v3/include/parallel/multiway_merge.h
index e4bd0042282..d894e636a3e 100644
--- a/libstdc++-v3/include/parallel/multiway_merge.h
+++ b/libstdc++-v3/include/parallel/multiway_merge.h
@@ -2067,6 +2067,6 @@ namespace __gnu_parallel
(__seqs_begin, __seqs_end, __target, __length, __comp,
 exact_tag(__tag.__get_num_threads()));
 }
-}; // namespace __gnu_parallel
+} // namespace __gnu_parallel
 
 #endif /* _GLIBCXX_PARALLEL_MULTIWAY_MERGE_H */
diff --git a/libstdc++-v3/include/parallel/quicksort.h 
b/libstdc++-v3/include/parallel/quicksort.h
index a678b6d4690..c728cd91c24 100644
--- a/libstdc++-v3/include/parallel/quicksort.h
+++ b/libstdc++-v3/include/parallel/quicksort.h
@@ -66,12 +66,15 @@ namespace __gnu_parallel
   _ValueType* __samples = static_cast<_ValueType*>
(::operator new(__num_samples * sizeof(_ValueType)));
 
+#pragma GCC

Re: [PATCH] aarch64: Set Armv9-A generic L1 cache line size to 64 bytes

2024-10-04 Thread Kyrylo Tkachov

Hi Richard,

> On 1 Oct 2024, at 13:35, Richard Sandiford  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Kyrylo Tkachov  writes:
>> Hi all,
>> I'd like to use a value of 64 bytes for the L1 cache size for Armv9-A
>> generic tuning.
>> As described in g:9a99559a478111f7fbeec29bd78344df7651c707 this value is used
>> to set the std::hardware_destructive_interference_size value which we want to
>> be not overly large when running concurrent applications on large core-count
>> systems.
>> 
>> The generic value for Armv8-A systems and the port baseline is 256 bytes
>> because that's what the A64FX CPU has, as set de-facto in
>> aarch64_override_options_internal.
>> 
>> But for Armv9-A CPUs as far as I know there isn't anything larger
>> than 64 bytes, so we should be able to use the smaller value here and reduce
>> the size of concurrent structs that use
>> std::hardware_destructive_interference_size to pad their fields.
>> 
>> Bootstrapped and tested on aarch64-none-linux-gnu.
>> 
>> WDYT?
> 
> I suppose doing this for a form of generic tuning goes somewhat against:
> 
>  /* Set up parameters to be used in prefetching algorithm.  Do not
> override the defaults unless we are tuning for a core we have
> researched values for.  */
> 

Yeah, I think the intent of that comment is for heuristics that guide the SW 
prefetch emission.
I think it was added before the introduction of 
std::hardware_destructive_interference_size and its
dependence on the L1 cache line size.


> But I agree it doesn't make conceptual sense to constrain a known-to-be
> Armv9-A core based on values that are only needed for Armv8-A cores.
> So no objection from me FWIW.
> 

Thanks, I’ll commit this version but we may want to refactor things a bit in 
that area in the future.
It’s also something to consider when adding new Neoverse core support.


> I think we would need to do something else if there are ever Armv9-A
> cores with different L1 cache line sizes though.  E.g. if a new Armv9-A
> core has a 128-byte cache line, we would probably want to set the range
> to [64, 128] rather than the patch's [64, 64], and rather than the
> current [64, 256].

From what I can tell it would be catastrophically bad to have a smaller value 
than the real hardware
because of introduction of false sharing. Having a larger value than the HW is 
not as bad, but suboptimal.
So the compiler would have to pick the largest of the possible values IMO.
Maybe there’s some clever scheme we could invent in 
aarch64_override_options_internal to go through
all the CPUs of the specified architecture and above and take the maximum of 
their sizes automatically.
These values are known to the compiler statically after all.
Thanks,
Kyrill


> 
> Thanks,
> Richard
> 
> 
>> Thanks,
>> Kyrill
>> 
>> 
>>* config/aarch64/tuning_models/generic_armv9_a.h
>>(generic_armv9a_prefetch_tune): Define.
>>(generic_armv9_a_tunings): Use the above.
>> 
>> From 93aa4ec4d972dfff02ccd6751af160ed243aa750 Mon Sep 17 00:00:00 2001
>> From: Kyrylo Tkachov 
>> Date: Fri, 20 Sep 2024 05:11:39 -0700
>> Subject: [PATCH] aarch64: Set Armv9-A generic L1 cache line size to 64 bytes
>> 
>> I'd like to use a value of 64 bytes for the L1 cache size for Armv9-A
>> generic tuning.
>> As described in g:9a99559a478111f7fbeec29bd78344df7651c707 this value is used
>> to set the std::hardware_destructive_interference_size value which we want to
>> be not overly large when running concurrent applications on large core-count
>> systems.
>> 
>> The generic value for Armv8-A systems and the port baseline is 256 bytes
>> because that's what the A64FX CPU has, as set de-facto in
>> aarch64_override_options_internal.
>> 
>> But for Armv9-A CPUs as far as I know there isn't anything larger
>> than 64 bytes, so we should be able to use the smaller value here and reduce
>> the size of concurrent structs that use
>> std::hardware_destructive_interference_size to pad their fields.
>> 
>> Bootstrapped and tested on aarch64-none-linux-gnu.
>> 
>>  * config/aarch64/tuning_models/generic_armv9_a.h
>>  (generic_armv9a_prefetch_tune): Define.
>>  (generic_armv9_a_tunings): Use the above.
>> ---
>> gcc/config/aarch64/tuning_models/generic_armv9_a.h | 14 +-
>> 1 file changed, 13 insertions(+), 1 deletion(-)
>> 
>> diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h 
>> b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
>> index 85ed40f..76b3e4c9cf7 100644
>> --- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h
>> +++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
>> @@ -207,6 +207,18 @@ static const struct cpu_vector_cost 
>> generic_armv9_a_vector_cost =
>>   &generic_armv9_a_vec_issue_info /* issue_info  */
>> };
>> 
>> +/* Generic prefetch settings (which disable prefetch).  */
>> +static const cpu_prefetch_tune generic_armv9a_prefetch_tune =
>> +{
>> +  0, /* num_slots  */
>> +  -1,

Re: [PATCH 2/3] cfgexpand: Handle scope conflicts better [PR111422]

2024-10-04 Thread Andrew Pinski

On Fri, Oct 4, 2024, 12:07 AM Richard Biener 
wrote:

> On Thu, Oct 3, 2024 at 6:09 PM Andrew Pinski 
> wrote:
> >
> > After fixing loop-im to do the correct overflow rewriting
> > for pointer types too. We end up with code like:
> > ```
> >   _9 = (unsigned long) &g;
> >   _84 = _9 + 18446744073709551615;
> >   _11 = _42 + _84;
> >   _44 = (signed char *) _11;
> > ...
> >   *_44 = 10;
> >   g ={v} {CLOBBER(eos)};
> > ...
> >   n[0] = &f;
> >   *_44 = 8;
> >   g ={v} {CLOBBER(eos)};
> > ```
> > Which was not being recongized by the scope conflicts code.
> > This was because it only handled one level walk backs rather than
> multiple ones.
> > This fixes it by using a work_list to avoid huge recursion and a visited
> bitmape to avoid
> > going into an infinite loops when dealing with loops.
>
> Ick.  This is now possibly an unbound walk from every use (even duplicate
> use!).
> Micro-optimizing would be restricting the INTEGRAL_TYPE_P types to ones
> matching pointer size.  Another micro-optimization would be to track/cache
> whether a SSA def is based on a pointer, more optimizing to cache what
> pointer(s!) it is based on.
>
> There's testcases in bugzilla somewhere hard on compile-time in this code
> and I can imagine a trivial degenerate one to trigger the issue.
>


I was thinking about that too. Adding a cache should easy. Especially one
that lives over the whole walk of the basic blocks. And yes stopping at
integer sizes which is less than a pointer size seems also a reasonable
idea. Note I have a patch on top of this that were vector types and
constructs are handled too.
Will work on this tomorrow.

Thanks,
Andrew



> Richard.
>
> > Bootstrapped and tested on x86_64-linux-gnu.
> >
> > PR tree-optimization/111422
> >
> > gcc/ChangeLog:
> >
> > * cfgexpand.cc (add_scope_conflicts_2): Rewrite to be a full walk
> > of all operands and their uses.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/cfgexpand.cc | 46 +++---
> >  1 file changed, 27 insertions(+), 19 deletions(-)
> >
> > diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> > index 6c1096363af..2e653d7207c 100644
> > --- a/gcc/cfgexpand.cc
> > +++ b/gcc/cfgexpand.cc
> > @@ -573,32 +573,40 @@ visit_conflict (gimple *, tree op, tree, void
> *data)
> >
> >  /* Helper function for add_scope_conflicts_1.  For USE on
> > a stmt, if it is a SSA_NAME and in its SSA_NAME_DEF_STMT is known to
> be
> > -   based on some ADDR_EXPR, invoke VISIT on that ADDR_EXPR.  */
> > +   based on some ADDR_EXPR, invoke VISIT on that ADDR_EXPR. Also walk
> > +   the assignments backwards as they might be based on an ADDR_EXPR.  */
> >
> > -static inline void
> > +static void
> >  add_scope_conflicts_2 (tree use, bitmap work,
> >walk_stmt_load_store_addr_fn visit)
> >  {
> > -  if (TREE_CODE (use) == SSA_NAME
> > -  && (POINTER_TYPE_P (TREE_TYPE (use))
> > - || INTEGRAL_TYPE_P (TREE_TYPE (use
> > +  auto_vec work_list;
> > +  auto_bitmap visited_ssa_names;
> > +  work_list.safe_push (use);
> > +
> > +  while (!work_list.is_empty())
> >  {
> > -  gimple *g = SSA_NAME_DEF_STMT (use);
> > -  if (gassign *a = dyn_cast  (g))
> > +  use = work_list.pop();
> > +  if (!use)
> > +   continue;
> > +  if (TREE_CODE (use) == ADDR_EXPR)
> > +   visit (nullptr, TREE_OPERAND (use, 0), use, work);
> > +  else if (TREE_CODE (use) == SSA_NAME
> > +  && (POINTER_TYPE_P (TREE_TYPE (use))
> > +  || INTEGRAL_TYPE_P (TREE_TYPE (use
> > {
> > - if (tree op = gimple_assign_rhs1 (a))
> > -   if (TREE_CODE (op) == ADDR_EXPR)
> > - visit (a, TREE_OPERAND (op, 0), op, work);
> > + gimple *g = SSA_NAME_DEF_STMT (use);
> > + if (!bitmap_set_bit (visited_ssa_names, SSA_NAME_VERSION(use)))
> > +   continue;
> > + if (gassign *a = dyn_cast  (g))
> > +   {
> > + for (unsigned i = 1; i < gimple_num_ops (g); i++)
> > +   work_list.safe_push (gimple_op (a, i));
> > +   }
> > + else if (gphi *p = dyn_cast  (g))
> > +   for (unsigned i = 0; i < gimple_phi_num_args (p); ++i)
> > + work_list.safe_push (gimple_phi_arg_def (p, i));
> > }
> > -  else if (gphi *p = dyn_cast  (g))
> > -   for (unsigned i = 0; i < gimple_phi_num_args (p); ++i)
> > - if (TREE_CODE (use = gimple_phi_arg_def (p, i)) == SSA_NAME)
> > -   if (gassign *a = dyn_cast  (SSA_NAME_DEF_STMT
> (use)))
> > - {
> > -   if (tree op = gimple_assign_rhs1 (a))
> > - if (TREE_CODE (op) == ADDR_EXPR)
> > -   visit (a, TREE_OPERAND (op, 0), op, work);
> > - }
> >  }
> >  }
> >
> > --
> > 2.34.1
> >
>

RE: [PATCH] middle-end: reorder masking priority of math functions

2024-10-04 Thread Tamar Christina

Hi Victor,

> -Original Message-
> From: Victor Do Nascimento 
> Sent: Wednesday, October 2, 2024 5:26 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Tamar Christina ; richard.guent...@gmail.com;
> Victor Do Nascimento 
> Subject: [PATCH] middle-end: reorder masking priority of math functions
> 
> Given the categorization of math built-in functions as `ECF_CONST',
> when if-converting their uses, their calls are not masked and are thus
> called with an all-true predicate.
> 
> This, however, is not appropriate where built-ins have library
> equivalents, wherein they may exhibit highly architecture-specific
> behaviors. For example, vectorized implementations may delegate the
> computation of values outside a certain acceptable numerical range to
> special (non-vectorized) routines which considerably slow down
> computation.
> 
> As numerical simulation programs often do bounds check on input values
> prior to math calls, conditionally assigning default output values for
> out-of-bounds input and skipping the math call altogether, these
> fallback implementations should seldom be called in the execution of
> vectorized code.  If, however, we don't apply any masking to these
> math functions, we end up effectively executing both if and else
> branches for these values, leading to considerable performance
> degradation on scientific workloads.
> 
> We therefore invert the order of handling of math function calls in
> `if_convertible_stmt_p' to prioritize the handling of their
> library-provided implementations over the equivalent internal function.

I think this makes sense to me from a technical standpoint and from an SVE
one.  Though I think the original order may have been there because of the
assumption that on some uarches unpredicated implementations are faster than
predicated ones.

So there may be some concerns about this order being slower for some.
I'll leave it up to Richi since e.g. I don't know the perf characteristics of 
the
x86 variants here, but if there is a concern you could use the
conditional_operation_is_expensive target hook to decide on the preferred order.

But other than that the change itself looks good to be but you still need 
approval.

Cheers,
Tamar

> 
> Regression tested on aarch64-none-linux-gnu & x86_64-linux-gnu w/ no
> new regressions.
> 
> gcc/ChangeLog:
> 
>   * tree-if-conv.cc (if_convertible_stmt_p): Check for explicit
>   function declaration before IFN fallback.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-fncall-mask-math.c: New.
> ---
>  .../gcc.dg/vect/vect-fncall-mask-math.c   | 33 +++
>  gcc/tree-if-conv.cc   | 18 +-
>  2 files changed, 42 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
> b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
> new file mode 100644
> index 000..15e22da2807
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
> @@ -0,0 +1,33 @@
> +/* Test the correct application of masking to autovectorized math function 
> calls.
> +   Test is currently set to xfail pending the release of the relevant lmvec
> +   support. */
> +/* { dg-do compile { target { aarch64*-*-* } } } */
> +/* { dg-additional-options "-march=armv8.2-a+sve -fdump-tree-ifcvt-raw 
> -Ofast"
> { target { aarch64*-*-* } } } */
> +
> +#include 
> +
> +const int N = 20;
> +const float lim = 101.0;
> +const float cst =  -1.0;
> +float tot =   0.0;
> +
> +float b[20];
> +float a[20] = { [0 ... 9] = 1.7014118e39, /* If branch. */
> + [10 ... 19] = 100.0 };/* Else branch.  */
> +
> +int main (void)
> +{
> +  #pragma omp simd
> +  for (int i = 0; i < N; i += 1)
> +{
> +  if (a[i] > lim)
> + b[i] = cst;
> +  else
> + b[i] = expf (a[i]);
> +  tot += b[i];
> +}
> +  return (0);
> +}
> +
> +/* { dg-final { scan-tree-dump-not { gimple_call } ifcvt { 
> xfail {
> aarch64*-*-* } } } } */
> +/* { dg-final { scan-tree-dump { gimple_call <.MASK_CALL, _2, expf, _1, 
> _30>} ifcvt
> { xfail { aarch64*-*-* } } } } */
> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
> index 3b04d1e8d34..90c754a4814 100644
> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -1133,15 +1133,6 @@ if_convertible_stmt_p (gimple *stmt,
> vec refs)
> 
>  case GIMPLE_CALL:
>{
> - /* There are some IFN_s that are used to replace builtins but have the
> -same semantics.  Even if MASK_CALL cannot handle them vectorable_call
> -will insert the proper selection, so do not block conversion.  */
> - int flags = gimple_call_flags (stmt);
> - if ((flags & ECF_CONST)
> - && !(flags & ECF_LOOPING_CONST_OR_PURE)
> - && gimple_call_combined_fn (stmt) != CFN_LAST)
> -   return true;
> -
>   tree fndecl = gimple_call_fndecl (stmt);
>   if (fndecl)
> {
> @@ -1160,6 +1151,15 @@ if_conv

Re: [PATCH] libstdc++: Test 17_intro/names.cc with -D_FORTIFY_SOURCE=2 [PR116210]

2024-10-04 Thread Jonathan Wakely

On Fri, 4 Oct 2024 at 13:28, Jakub Jelinek  wrote:
>
> On Fri, Oct 04, 2024 at 12:52:11PM +0100, Jonathan Wakely wrote:
> > This doesn't really belong in our testsuite, because the sole purpose of
> > the new test is to find bugs in the Glibc wrappers (like the one linked
> > below). But maybe it's a kindness to do it in our testsuite, because we
> > already have this test in place, and one Glibc bug was already found
> > thanks to Sam running the existing test with _FORTIFY_SOURCE defined.
> >
> > Should we do this?
>
> I think so.  While those bugs are glibc bugs, libstdc++ uses libc headers
> and so if they have namespace cleanness issues, so does libstdc++.

Yeah, we have lots of #undef in that test to deal with libc headers
that we can't change, but for Glibc we know we can fix problems much
more easily than for e.g. proprietary UNIX headers.

>
> > Add a new testcase that repeats 17_intro/names.cc but with
> > _FORTIFY_SOURCE defined, to find problems in Glibc fortify wrappers like
> > https://sourceware.org/bugzilla/show_bug.cgi?id=32052 (which is fixed
> > now).
> >
> > libstdc++-v3/ChangeLog:
> >
> >   PR libstdc++/116210
> >   * testsuite/17_intro/names.cc (sz): Undef for versions of Glibc
> >   that use it in the fortify wrappers.
> >   * testsuite/17_intro/names_fortify.cc: New test.
>
> Jakub
>

[Patch] OpenMP: Allocate directive for static vars, clean up

2024-10-04 Thread Tobias Burnus


'omp allocate' permits to use a different (specified) allocator and
alignment for both stack/automatic and static/saved variables; the latter
takes only predefined allocators. Currently, only C and Fortran are
support for stack/automatic variables; static variables are rejected
before the attached patch. (For them, only predefined allocators are
permitted.)

* * *

I happened to look at the 'allocate' directive recently and, doing so,
I stumbled over a couple of issues, which the attached patch addresses
(missing diagnostics for corner cases, not updated checks, unhelpful
documentation ['allocate' *clause*], ...). Doing so, I wondered whether:

Shouldn't we just accept 'omp allocate' for static
variables by just honoring the aligning and ignoring the actually requested
allocator? - First, we do already the same for actual allocations as not all
traits are supported. And for the host this seems to be the most sensible to
do in any case.
[For some use cases, pointers + allocation in the constructor would be
better, but in general, not adding an indirection seems to be better and
has fewer corner-case usability issue.]

I guess we later want to honor the requested memory for nvptx and/or gcn; at
least Nvidia GPUs could make use for constant memory (having advantages for
reading the same memory by many threads/broadcasting it). I guess OpenACC 2.7's
'readonly' modifier serves a similar purpose.
For now we don't, but the attribute is passed on to the backends, which could
make use of them, if desired. ('groupprivate' directive vs. cgroup/thread
allocators are similar device-only features.)

As mentioned, this patch also fixes a few other issues here and there, see
commit log and source code for details.

Code comments? Suggestions or remarks? - Before I apply this patch?

Tobias

PS: I am aware that C++ support is lacking. There is a pending patch that needs
to be updated for this patch, probably some bitrotting, and in particular for 
the
review comments, cf. 
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633782.html
and https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639929.html
OpenMP: Allocate directive for static vars, clean up

For the 'allocate' directive, remove the sorry for static variables and
just keep using normal memory, but honor the requested alignment and set
a DECL_ATTRIBUTE in case a target may want to make use of this later on.
The documentation is updated accordingly.

The C diagnostic to check for predefined allocators in this case failed
to accept GCC's ompx_gnu_... allocator, now fixed. (Fortran was already
okay; but both now use new common #defined value for checking.)
And while Fortran common block variables are still rejected, the check
has been improved as before the sorry diagnostic did not work for
common blocks in modules.

Finally, for 'allocate' clause on the target/task/taskloop directives,
there is now a warning for omp_thread_mem_alloc (i.e. predefined allocator
with access = thread), which is undefined behavior according to the
OpenMP specification.

And, last, testing showed that var decl + static_assert sets TREE_USED
but does not produce a statement list in C, which did run into an assert
in gimplify. This special case is now also handled.


gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_allocate): Set alignment for alignof;
	accept static variables and fix predef allocator check.

gcc/fortran/ChangeLog:

	* openmp.cc (is_predefined_allocator): Use gomp-constants.h consts.
	* trans-common.cc (translate_common): Reject OpenMP allocate directives.
	* trans-decl.cc (gfc_finish_var_decl): Handle allocate directive
	for static variables.
	(gfc_trans_deferred_vars): Update for the latter.

gcc/ChangeLog:

	* gimplify.cc (gimplify_bind_expr): Fix corner case for OpenMP
	allocate directive.	
	(gimplify_scan_omp_clauses): Warn if omp_thread_mem_alloc is used
	as allocator with the target/task/taskloop directive.

include/ChangeLog:

	* gomp-constants.h (GOMP_OMP_PREDEF_ALLOC_MAX,
	GOMP_OMPX_PREDEF_ALLOC_MIN, GOMP_OMPX_PREDEF_ALLOC_MAX,
	GOMP_OMP_PREDEF_ALLOC_THREADS): New defines.

libgomp/ChangeLog:

	* allocator.c: Add static asserts for news
	 GOMP_OMP{,X}_PREDEF_ALLOC_{MIN,MAX} range values.
	* libgomp.texi (OpenMP Impl. Status): Allocate directive for
	static vars is now supported. Refer to PR for allocate clause.
	(Memory allocation): Update for static vars; minor word tweaking.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/allocate-9.c: Update for removed sorry.
	* gfortran.dg/gomp/allocate-15.f90: Likewise.
	* gfortran.dg/gomp/allocate-pinned-1.f90: Likewise.
	* gfortran.dg/gomp/allocate-4.f90: Likewise; add dg-error for
	previously missing diagnostic.
	* c-c++-common/gomp/allocate-18.c: New test.
	* c-c++-common/gomp/allocate-19.c: New test.
	* gfortran.dg/gomp/allocate-clause.f90: New test.
	* gfortran.dg/gomp/allocate-static-2.f90: New test.
	* gfortran.dg/gomp/allocate-static.f90: New test.

 gcc/c/c-parser.cc  |  29

Re: [PATCH 1/2] c++: add -Wdeprecated-literal-operator [CWG2521]

2024-10-04 Thread Jason Merrill


On 10/4/24 8:22 AM, Jakub Jelinek wrote:

On Fri, Oct 04, 2024 at 12:19:03PM +0200, Jakub Jelinek wrote:

Though, maybe the tests should have both the deprecated syntax and the
non-deprecated one...


Here is a variant of the patch which does that.

Tested on x86_64-linux and i686-linux, ok for trunk?


OK.


2024-10-04  Jakub Jelinek  

* g++.dg/cpp26/unevalstr1.C: Revert the 2024-10-03 changes, instead
expect extra warnings.  Add another set of tests without space
between " and _.
* g++.dg/cpp26/unevalstr2.C: Expect extra warnings for C++23.  Add
another set of tests without space between " and _.

--- gcc/testsuite/g++.dg/cpp26/unevalstr1.C.jj  2024-10-04 12:28:08.820899177 
+0200
+++ gcc/testsuite/g++.dg/cpp26/unevalstr1.C 2024-10-04 14:15:35.563531334 
+0200
@@ -83,21 +83,57 @@ extern "\o{0103}" { int f14 (); } // { d
  [[nodiscard ("\x{20}")]] int h19 ();// { dg-error "numeric escape sequence 
in unevaluated string" }
  [[nodiscard ("\h")]] int h20 ();// { dg-error "unknown escape sequence" }
  
-float operator ""_my0 (const char *);

-float operator "" ""_my1 (const char *);
-float operator L""_my2 (const char *);   // { dg-error "invalid encoding prefix 
in literal operator" }
-float operator u""_my3 (const char *);   // { dg-error "invalid encoding prefix 
in literal operator" }
-float operator U""_my4 (const char *);   // { dg-error "invalid encoding prefix 
in literal operator" }
-float operator u8""_my5 (const char *);  // { dg-error "invalid encoding prefix 
in literal operator" }
-float operator L"" ""_my6 (const char *);  // { dg-error "invalid encoding prefix 
in literal operator" }
-float operator u"" ""_my7 (const char *);  // { dg-error "invalid encoding prefix 
in literal operator" }
-float operator U"" ""_my8 (const char *);  // { dg-error "invalid encoding prefix 
in literal operator" }
-float operator u8"" ""_my9 (const char *); // { dg-error "invalid encoding prefix 
in literal operator" }
-float operator "" L""_my10 (const char *); // { dg-error "invalid encoding prefix 
in literal operator" }
-float operator "" u""_my11 (const char *); // { dg-error "invalid encoding prefix 
in literal operator" }
-float operator "" U""_my12 (const char *); // { dg-error "invalid encoding prefix 
in literal operator" }
-float operator "" u8""_my13 (const char *);// { dg-error "invalid encoding 
prefix in literal operator" }
-float operator "\0"_my14 (const char *); // { dg-error "expected empty string 
after 'operator' keyword" }
-float operator "\x00"_my15 (const char *);   // { dg-error "expected empty string 
after 'operator' keyword" }
-float operator "\h"_my16 (const char *); // { dg-error "expected empty string 
after 'operator' keyword" }
+float operator "" _my0 (const char *);
+float operator "" "" _my1 (const char *);
+float operator L"" _my2 (const char *);  // { dg-error "invalid 
encoding prefix in literal operator" }
+float operator u"" _my3 (const char *);  // { dg-error "invalid 
encoding prefix in literal operator" }
+float operator U"" _my4 (const char *);  // { dg-error "invalid 
encoding prefix in literal operator" }
+float operator u8"" _my5 (const char *); // { dg-error "invalid encoding prefix 
in literal operator" }
+float operator L"" "" _my6 (const char *); // { dg-error "invalid encoding prefix 
in literal operator" }
+float operator u"" "" _my7 (const char *); // { dg-error "invalid encoding prefix 
in literal operator" }
+float operator U"" "" _my8 (const char *); // { dg-error "invalid encoding prefix 
in literal operator" }
+float operator u8"" "" _my9 (const char *);// { dg-error "invalid encoding 
prefix in literal operator" }
+float operator "" L"" _my10 (const char *);// { dg-error "invalid encoding 
prefix in literal operator" }
+float operator "" u"" _my11 (const char *);// { dg-error "invalid encoding 
prefix in literal operator" }
+float operator "" U"" _my12 (const char *);// { dg-error "invalid encoding 
prefix in literal operator" }
+float operator "" u8"" _my13 (const char *);   // { dg-error "invalid encoding 
prefix in literal operator" }
+float operator "\0" _my14 (const char *);// { dg-error "expected empty string 
after 'operator' keyword" }
+float operator "\x00" _my15 (const char *);  // { dg-error "expected empty string 
after 'operator' keyword" }
+float operator "\h" _my16 (const char *);// { dg-error "expected empty string 
after 'operator' keyword" }
+   // { dg-error "unknown escape sequence" 
"" { target *-*-* } .-1 }
+// { dg-warning "space between quotes and suffix is deprecated" "" { target 
*-*-* } .-18 }
+// { dg-warning "space between quotes and suffix is deprecated" "" { target 
*-*-* } .-18 }
+// { dg-warning "space between quotes and suffix is deprecated" "" { target 
*-*-* } .-18 }
+// { dg-warning "space between quotes

Re: [PATCH] libstdc++: Unroll loop in load_bytes function

2024-10-04 Thread Dmitry Ilvokhin

On Fri, Oct 04, 2024 at 10:20:27AM +0100, Jonathan Wakely wrote:
> On Fri, 4 Oct 2024 at 10:19, Jonathan Wakely  wrote:
> >
> > On Fri, 4 Oct 2024 at 07:53, Richard Biener  
> > wrote:
> > >
> > > On Wed, Oct 2, 2024 at 8:26 PM Jonathan Wakely  wrote:
> > > >
> > > > On Wed, 2 Oct 2024 at 19:16, Jonathan Wakely  wrote:
> > > > >
> > > > > On Wed, 2 Oct 2024 at 19:15, Dmitry Ilvokhin  
> > > > > wrote:
> > > > > >
> > > > > > Instead of looping over every byte of the tail, unroll loop manually
> > > > > > using switch statement, then compilers (at least GCC and Clang) will
> > > > > > generate a jump table [1], which is faster on a microbenchmark [2].
> > > > > >
> > > > > > [1]: https://godbolt.org/z/aE8Mq3j5G
> > > > > > [2]: https://quick-bench.com/q/ylYLW2R22AZKRvameYYtbYxag24
> > > > > >
> > > > > > libstdc++-v3/ChangeLog:
> > > > > >
> > > > > > * libstdc++-v3/libsupc++/hash_bytes.cc (load_bytes): unroll
> > > > > >   loop using switch statement.
> > > > > >
> > > > > > Signed-off-by: Dmitry Ilvokhin 
> > > > > > ---
> > > > > >  libstdc++-v3/libsupc++/hash_bytes.cc | 27 
> > > > > > +++
> > > > > >  1 file changed, 23 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/libstdc++-v3/libsupc++/hash_bytes.cc 
> > > > > > b/libstdc++-v3/libsupc++/hash_bytes.cc
> > > > > > index 3665375096a..294a7323dd0 100644
> > > > > > --- a/libstdc++-v3/libsupc++/hash_bytes.cc
> > > > > > +++ b/libstdc++-v3/libsupc++/hash_bytes.cc
> > > > > > @@ -50,10 +50,29 @@ namespace
> > > > > >load_bytes(const char* p, int n)
> > > > > >{
> > > > > >  std::size_t result = 0;
> > > > > > ---n;
> > > > > > -do
> > > > > > -  result = (result << 8) + static_cast(p[n]);
> > > > > > -while (--n >= 0);
> > > > >
> > > > > Don't we still need to loop, for the case where n >= 8? Otherwise we
> > > > > only hash the first 8 bytes.
> > > >
> > > > Ah, but it's only ever called with load_bytes(end, len & 0x7)
> > >
> > > The compiler should do such transforms - you probably want to tell
> > > it that n < 8 though, it likely doesn't (always) know.
> >
> > e.g. like this?
> >
> > if ((n & 7) != n)
> >   __builtin_unreachable();
> >
> > For the microbenchmark that seems to make things consistently worse:
> > https://quick-bench.com/q/2yCEqzFS8R8ueJ0-Gs-sZ6uWWEw
> 
> Oh actually in the benchmark I used (!(1 <= n && n < 8)) because 1 <=
> n is always true too.
>

GCC still wasn't able to unroll the loop, even with a
__builtin_unreachable, but benchmark link you mentioned above uses -O2
optimization level (not sure if it was intentional).

If we'll use -O3 [1], then GCC was able to unroll the loop for
load_bytes_loop_assume version, but at the same time I am not sure all
loop control instructions were elided, I still can see them on Godbolt
version of generated code [2]. Benchmark charts partially confirm that,
because performance of load_bytes_loop and load_bytes_loop_assume are
now quite close (same actually, except case n = 1). I guess it would
make sense, as we execute same amount of instructions.

In addition, chart for load_bytes_switch look quite jumpy for [1] and
became better for cases n = 1 and n = 2. At this point I am not sure it
is not a code alignment issue and we are not measuring noise.

[1]: https://quick-bench.com/q/LlcgMVhL61CasZVjCWbHd3uid8w
[2]: https://godbolt.org/z/qPf1n7xWs

Re: [patch,testsuite] Some float64 and float32x test require double64plus.

2024-10-04 Thread Mike Stump

On Oct 4, 2024, at 9:40 AM, Georg-Johann Lay  wrote:
> 
> Some of the float64 and float32x test cases are using double built-ins
> and hence require double64plus resp. double_float32xplus, i.e. double
> is at least as good as float32x.
> 
> This patch adds according dg-require-effective-target filters.
> (But only for test cases where I can verify that they are working
> with double64+ but are failing with double32.)
> 
> Ok for trunk?

Ok.  If you are that domain expert, these sorts of changes are more obvious to 
you than to me. :-)

[PATCH v3 0/5] openmp: Add support for iterators in OpenMP mapping clauses

2024-10-04 Thread Kwok Cheung Yeung


This is a further improved patch series to that posted at:
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662138.html

The main change is that the expansion of the iterators is pushed back 
further to the omp-lowering stage. This is because the recently 
committed deep-mapping support (and features such as strided array 
updates in the OpenMP development branch) do their work in the omplower 
stage, but the iterators need to be expanded after any changes to the 
clauses and their decls/sizes have occurred. This patch set does not 
support deep-mapping yet - it just emits a sorry when this happens.


The iterator expansion now does not happen all at once. A new loop is 
generated the first time a clause with a new iterator is found, and is 
reused if the same iterator is used in another clause. Assigning the 
clause decl and size in the iterator loop are now done by calling 
separate functions from lower_target, which also return the new 
hostaddr/size expression from the iterator loop that should be passed to 
libgomp. This fits in better with the existing code structure. A final 
function is called to finalise all the loops. As multiple sets of loops 
can be 'in-flight' at once, a new structure and a hash map are used to 
keep track of their states.


Instead of making the OMP_CLAUSE_DECL into a tree list with the iterator 
in TREE_PURPOSE and original decl in TREE_VALUE, I have stored the 
iterator in a third argument in the clause tree node addressed by 
OMP_CLAUSE_ITERATOR instead. In this way, changes do not have to be made 
in the intervening code-path to extract the original OMP_CLAUSE_DECL 
(which is messy and error-prone), allowing code unrelated to iterators 
to go unmodified. Nearly all special-cases for iterators have now been 
removed.


I have also fixed some issues detected by Linaro CI - some format 
specifier issues, and some tests that expect a non-unified target 
address space failing.


Gomp GCC tests and libgomp tests run on x86_64 host with Nvidia 
offloading. Okay for trunk?


Kwok

[PATCH v3 1/5] openmp: Refactor handling of iterators

2024-10-04 Thread Kwok Cheung Yeung


This patch factors out the code to calculate the number of iterations
required and to generate the iteration loop into separate functions from
gimplify_omp_depend for reuse later.

I have also replaced the 'TREE_CODE (*tp) == TREE_LIST && ...' checks
used for detecting an iterator clause with a macro OMP_ITERATOR_DECL_P,
as it needs to be done frequently.From 34bf780b1e0395028ecdacfa1385238a8da13be6 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Fri, 4 Oct 2024 15:15:42 +0100
Subject: [PATCH 1/5] openmp: Refactor handling of iterators

Move code to calculate the iteration size and to generate the iterator
expansion loop into separate functions.

Use OMP_ITERATOR_DECL_P to check for iterators in clause declarations.

2024-10-04  Kwok Cheung Yeung  

gcc/c-family/
* c-omp.cc (c_finish_omp_depobj): Use OMP_ITERATOR_DECL_P.

gcc/c/
* c-typeck.cc (handle_omp_array_sections): Use OMP_ITERATOR_DECL_P.
(c_finish_omp_clauses): Likewise.

gcc/cp/
* pt.cc (tsubst_omp_clause_decl): Use OMP_ITERATOR_DECL_P.
* semantics.cc (handle_omp_array_sections): Likewise.
(finish_omp_clauses): Likewise.

gcc/
* gimplify.cc (gimplify_omp_affinity): Use OMP_ITERATOR_DECL_P.
(compute_iterator_count): New.
(build_iterator_loop): New.
(gimplify_omp_depend): Use OMP_ITERATOR_DECL_P, compute_iterator_count
and build_iterator_loop.
* tree-inline.cc (copy_tree_body_r): Use OMP_ITERATOR_DECL_P.
* tree-pretty-print.cc (dump_omp_clause): Likewise.
* tree.h (OMP_ITERATOR_DECL_P): New macro.
---
 gcc/c-family/c-omp.cc|   4 +-
 gcc/c/c-typeck.cc|  13 +-
 gcc/cp/pt.cc |   4 +-
 gcc/cp/semantics.cc  |   8 +-
 gcc/gimplify.cc  | 326 +++
 gcc/tree-inline.cc   |   5 +-
 gcc/tree-pretty-print.cc |   8 +-
 gcc/tree.h   |   6 +
 8 files changed, 175 insertions(+), 199 deletions(-)

diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc
index 620a3c1353a..24c8a801255 100644
--- a/gcc/c-family/c-omp.cc
+++ b/gcc/c-family/c-omp.cc
@@ -744,9 +744,7 @@ c_finish_omp_depobj (location_t loc, tree depobj,
  kind = OMP_CLAUSE_DEPEND_KIND (clause);
  t = OMP_CLAUSE_DECL (clause);
  gcc_assert (t);
- if (TREE_CODE (t) == TREE_LIST
- && TREE_PURPOSE (t)
- && TREE_CODE (TREE_PURPOSE (t)) == TREE_VEC)
+ if (OMP_ITERATOR_DECL_P (t))
{
  error_at (OMP_CLAUSE_LOCATION (clause),
"% modifier may not be specified on "
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index ba6d96d26b2..30a03f071d8 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -14504,9 +14504,7 @@ handle_omp_array_sections (tree &c, enum 
c_omp_region_type ort)
   tree *tp = &OMP_CLAUSE_DECL (c);
   if ((OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEPEND
|| OMP_CLAUSE_CODE (c) == OMP_CLAUSE_AFFINITY)
-  && TREE_CODE (*tp) == TREE_LIST
-  && TREE_PURPOSE (*tp)
-  && TREE_CODE (TREE_PURPOSE (*tp)) == TREE_VEC)
+  && OMP_ITERATOR_DECL_P (*tp))
 tp = &TREE_VALUE (*tp);
   tree first = handle_omp_array_sections_1 (c, *tp, types,
maybe_zero_len, first_non_one,
@@ -15697,9 +15695,7 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
case OMP_CLAUSE_DEPEND:
case OMP_CLAUSE_AFFINITY:
  t = OMP_CLAUSE_DECL (c);
- if (TREE_CODE (t) == TREE_LIST
- && TREE_PURPOSE (t)
- && TREE_CODE (TREE_PURPOSE (t)) == TREE_VEC)
+ if (OMP_ITERATOR_DECL_P (t))
{
  if (TREE_PURPOSE (t) != last_iterators)
last_iterators_remove
@@ -15799,10 +15795,7 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
  break;
}
}
- if (TREE_CODE (OMP_CLAUSE_DECL (c)) == TREE_LIST
- && TREE_PURPOSE (OMP_CLAUSE_DECL (c))
- && (TREE_CODE (TREE_PURPOSE (OMP_CLAUSE_DECL (c)))
- == TREE_VEC))
+ if (OMP_ITERATOR_DECL_P (OMP_CLAUSE_DECL (c)))
TREE_VALUE (OMP_CLAUSE_DECL (c)) = t;
  else
OMP_CLAUSE_DECL (c) = t;
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 43468e5f62e..5a72402ba1f 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -17604,9 +17604,7 @@ tsubst_omp_clause_decl (tree decl, tree args, 
tsubst_flags_t complain,
 return decl;
 
   /* Handle OpenMP iterators.  */
-  if (TREE_CODE (decl) == TREE_LIST
-  && TREE_PURPOSE (decl)
-  && TREE_CODE (TREE_PURPOSE (decl)) == TREE_VEC)
+  if (OMP_ITERATOR_DECL_P (decl))
 {
   tree ret;
   if (iterator_cache[0] == TREE_PURPOSE (decl))
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 0cb46c1986c..4f856a9d749 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/

[PATCH v3 2/5] openmp: Add support for iterators in map clauses (C/C++)

2024-10-04 Thread Kwok Cheung Yeung

This patch modifies the C and C++ parsers to accept an iterator as a map 
type modifier, storing it in the OMP_CLAUSE_ITERATOR argument of the 
clause. When finishing clauses, any clauses generated from a clause with 
iterators also has the iterator applied to them.


During gimplification, check_omp_map_iterators is called to check that 
all iterator variables are referenced at some point with a clause. 
Gimplification of the clause decl and size are delayed until iterator 
expansion as they may reference iterator variables.


In lower_target, lower_omp_map_iterators is called to construct the 
expansion loop for iterator clauses. Clauses using the same set of 
iterators reuse the loop, though with different storage allocated for 
them. lower_omp_map_iterator_expr is called to add the final expression 
that is sent as the hostaddr for libgomp to the loop, and a reference to 
the array generated by the iterator loop is returned to replace the 
original expression. lower_omp_map_iterator_size works similarly for the 
clause size. finish_omp_map_iterators is called later to finalise the loop.


Libgomp has a new function gomp_merge_iterator_maps which identifies 
data coming from an iterator, and effectively creates new maps 
on-the-fly from the iterator info array, inserting them into the list of 
mappings at the point where iterator data occurred. As there are now 
multiple maps where one was previously, an entry is only added to the 
target vars for the first expanded map, otherwise it will get out of 
sync with the expected layout and the wrong variables will be picked up 
by the target function.From 50557e513ca534ba32f50d1b056a07a6f671 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Fri, 4 Oct 2024 15:16:12 +0100
Subject: [PATCH 2/5] openmp: Add support for iterators in map clauses (C/C++)

This adds preliminary support for iterators in map clauses within OpenMP
'target' constructs (which includes constructs such as 'target enter data').

Iterators with non-constant loop bounds are not currently supported.

2024-10-04  Kwok Cheung Yeung  

gcc/c/
* c-parser.cc (c_parser_omp_clause_map): Parse 'iterator' modifier.
* c-typeck.cc (c_finish_omp_clauses): Finish iterators.  Apply
iterators to generated clauses.

gcc/cp/
* parser.cc (cp_parser_omp_clause_map): Parse 'iterator' modifier.
* semantics.cc (finish_omp_clauses): Finish iterators.  Apply
iterators to generated clauses.

gcc/
* gimplify.cc (compute_iterator_count): Make non-static.  Take an
iterator instead of a clause for an operand.
(build_iterator_loop): Likewise.
(gimplify_omp_depend): Pass iterator in call to compute_iterator_count
and build_iterator_loop.
(find_var_decl): New.
(check_omp_map_iterators): New.
(gimplify_scan_omp_clauses): Call check_omp_map_iterators on clauses
with iterators.
(gimplify_adjust_omp_clauses): Skip gimplification of clause decl and
size for clauses with iterators.
* omp-low.cc (struct iterator_loop_info_t): New type.
(iterator_loop_map_t): New type.
(lower_omp_map_iterators): New.
(lower_omp_map_iterator_expr): New.
(lower_omp_map_iterator_size): New.
(finish_omp_map_iterators): New.
(lower_omp_target): Call lower_omp_map_iterators on clauses with
iterators.  Call lower_omp_map_iterator_expr before assigning to
sender ref.  Call lower_omp_map_iterator_size before setting the
size.  Call finish_omp_map_iterators.  Insert statements generated
during iterator expansion before the statements for the target
clause.
* tree-pretty-print.cc (dump_omp_clause): Call dump_omp_iterators
for iterators in map clauses.
* tree.cc (omp_clause_num_ops): Add operand for OMP_CLAUSE_MAP.
(walk_tree_1): Do not walk last operand of OMP_CLAUSE_MAP.
* tree.h (OMP_CLAUSE_HAS_ITERATORS): New.
(OMP_CLAUSE_ITERATORS: New.

gcc/testsuite/
* c-c++-common/gomp/map-6.c (foo): Amend expected error message.
* c-c++-common/gomp/target-map-iterators-1.c: New.
* c-c++-common/gomp/target-map-iterators-2.c: New.
* c-c++-common/gomp/target-map-iterators-3.c: New.

libgomp/
* target.c (kind_to_name): New.
(gomp_merge_iterator_maps): New.
(gomp_map_vars_internal): Call gomp_merge_iterator_maps.  Copy
address of only the first iteration to target vars.  Free allocated
variables.
* testsuite/libgomp.c-c++-common/target-map-iterators-1.c: New.
* testsuite/libgomp.c-c++-common/target-map-iterators-2.c: New.
* testsuite/libgomp.c-c++-common/target-map-iterators-3.c: New.
---
 gcc/c/c-parser.cc |  59 +-
 gcc/c/c-typeck.cc |  22 ++-
 gcc/cp/parser.cc  |  62 +

Re: [patch,avr] Implement TARGET_FLOATN_MODE

2024-10-04 Thread Georg-Johann Lay


Am 04.10.24 um 16:32 schrieb Jakub Jelinek:

On Fri, Oct 04, 2024 at 08:09:48AM -0600, Jeff Law wrote:



On 10/4/24 7:46 AM, Georg-Johann Lay wrote:

This patch implements TARGET_FLOATN_MODE which maps
_Float32[x] to SFmode and _Float64[x] to DFmode.

There is currently no library support for extended float types,
but these settings are more reasonable for avr (and they make
more tests pass).

Ok for trunk?

Johann

--

AVR: Implement TARGET_FLOATN_MODE.

gcc/
  * config/avr/avr.cc (avr_floatn_mode): New static function.
  (TARGET_FLOATN_MODE): New define.

OK


This is certainly incorrect.

As specified by e.g. ISO C23 H.2.3 Extended floating types, the requirement
on the extended floating types is:
"For each of its basic formats, IEC 60559 specifies an extended format whose 
maximum exponent and
precision exceed those of the basic format it is associated with. Extended 
formats are intended for
arithmetic with more precision and exponent range than is available in the 
basic formats used for
the input data."
So, while SFmode is a good mode to use for _Float32 and DFmode is a good
mode to use for _Float64, SFmode isn't a good mode to use for _Float32x and
neither is DFmode a good mode to use for _Float64x.
I'd expect you want DFmode for _Float32x and opt_scalar_float_mode () for
_Float64x.

Jakub


Thanks for the clarification.

So I guess that hook is not needed at all, and the default
implementation is already the best avr can do.

Johann

Re: [PATCH 2/3] Release expanded template argument vector

2024-10-04 Thread Patrick Palka

On Thu, 3 Oct 2024, Jason Merrill wrote:

> On 10/3/24 12:38 PM, Jason Merrill wrote:
> > On 10/2/24 7:50 AM, Richard Biener wrote:
> > > This reduces peak memory usage by 20% for a specific testcase.
> > > 
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > > 
> > > It's very ugly so I'd appreciate suggestions on how to handle such
> > > situations better?
> > 
> > I'm pushing this alternative patch, tested x86_64-pc-linux-gnu.
> 
> OK, apparently that was both too clever and not clever enough. Replacing it
> with this one that's much closer to yours.
> 
> Jason

> From: Jason Merrill 
> Date: Thu, 3 Oct 2024 16:31:00 -0400
> Subject: [PATCH] c++: free garbage vec in coerce_template_parms
> To: gcc-patches@gcc.gnu.org
> 
> coerce_template_parms can create two different vecs for the inner template
> arguments, new_inner_args and (potentially) the result of
> expand_template_argument_pack.  One or the other, or possibly both, end up
> being garbage: in the typical case, the expanded vec is garbage because it's
> only used as the source for convert_template_argument.  In some dependent
> cases, the new vec is garbage because we decide to return the original args
> instead.  In these cases, ggc_free the garbage vec to reduce the memory
> overhead of overload resolution.
> 
> gcc/cp/ChangeLog:
> 
>   * pt.cc (coerce_template_parms): Free garbage vecs.
> 
> Co-authored-by: Richard Biener 
> ---
>  gcc/cp/pt.cc | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 20affcd65a2..4ceae1d38de 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -9275,6 +9275,7 @@ coerce_template_parms (tree parms,
>   {
> /* We don't know how many args we have yet, just use the
>unconverted (and still packed) ones for now.  */
> +   ggc_free (new_inner_args);
> new_inner_args = orig_inner_args;
> arg_idx = nargs;
> break;
> @@ -9329,7 +9330,8 @@ coerce_template_parms (tree parms,
> = make_pack_expansion (conv, complain);
>  
>/* We don't know how many args we have yet, just
> - use the unconverted ones for now.  */
> +  use the unconverted (but unpacked) ones for now.  */
> +   ggc_free (new_inner_args);

I'm a bit worried about these ggc_frees.  If an earlier template
parameter is a constrained auto NTTP then new_inner_args/new_args could
have been captured by the satisfaction cache during coercion for that
argument, and so we'd be freeing a vector that's still live?

>new_inner_args = inner_args;
> arg_idx = nargs;
>break;
> @@ -9442,6 +9444,12 @@ coerce_template_parms (tree parms,
>  SET_NON_DEFAULT_TEMPLATE_ARGS_COUNT (new_inner_args,
>TREE_VEC_LENGTH (new_inner_args));
>  
> +  /* If we expanded packs in inner_args and aren't returning it now, the
> + expanded vec is garbage.  */
> +  if (inner_args != new_inner_args
> +  && inner_args != orig_inner_args)
> +ggc_free (inner_args);
> +
>return return_full_args ? new_args : new_inner_args;
>  }
>  
> -- 
> 2.46.2
>

Re: [RFC PATCH] ARM: thumb1: fix bad code emitted when HI_REGS involved

2024-10-04 Thread Siarhei Volkau

Hello,

пт, 4 окт. 2024 г. в 16:48, Christophe Lyon :
>
> Hi!
>
>
> On Mon, 8 Jul 2024 at 10:57, Siarhei Volkau  wrote:
> >
> > ping
> >
> > чт, 20 июн. 2024 г. в 12:09, Siarhei Volkau :
> > >
> > > This patch deals with consequences but not the root cause though.
> > >
> > > There are 5 cases which are subjects to rewrite:
> > > case #1:
> > >   mov ip, r1
> > >   add r2, ip
> > >   # ip is dead here
> > > can be rewritten as:
> > >   adds r2, r1
>
> Why replace 'add' with 'adds' ?
>
> Thanks,
>
> Christophe
>

Good catch, actually. Silly answer is:
because there's no alternative without {S} for Lo registers in thumb1.

Correct me if I'm wrong, I don't think that we have to do something
special with CC reg there because conditional execution instructions
(thumb1_cbz, cbranchsi4_insn) take care of that.
See thumb1_final_prescan_insn.

Thanks

Siarhei

> > >
> > > case #2:
> > >   add ip, r1
> > >   mov r1, ip
> > >   # ip is dead here
> > > can be rewritten as:
> > >   add r1, ip
> > >
> > > case #3:
> > >   mov ip, r1
> > >   add r2, ip
> > >   add r3, ip
> > >   # ip is dead here
> > > can be rewritten as:
> > >   adds r2, r1
> > >   adds r3, r1
> > >
> > > case #4:
> > >   mov ip, r1
> > >   add ip, r2
> > >   mov r1, ip
> > > can be rewritten as:
> > >   adds r1, r2
> > >   mov  ip, r1 <- might be eliminated too, if ip is dead
> > >
> > > case #5 (arbitrary):
> > >   mov  r1, ip
> > >   subs r2, r1, r2
> > >   mov  ip, r2
> > >   # r1 is dead here
> > > can be rewritten as:
> > >   rsbs r1, r2, #0
> > >   add  ip, r1
> > >   movs r2, ip <- might be eliminated, if r2 is dead
> > >
> > > Speed profit wasn't checked but size changes are the following:
> > >libgcc:  -132 bytes / -0.25%
> > >  libc: -1262 bytes / -0.55%
> > >  libm:  -384 bytes / -0.42%
> > > libstdc++: -2258 bytes / -0.30%
> > >
> > > No tests provided because its hard to force GCC to emit HI_REGS
> > > in a small and straightforward function.
> > >
> > > Signed-off-by: Siarhei Volkau 
> > > ---
> > >  gcc/config/arm/thumb1.md | 93 +++-
> > >  1 file changed, 92 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> > > index d7074b43f60..9da4af9eccd 100644
> > > --- a/gcc/config/arm/thumb1.md
> > > +++ b/gcc/config/arm/thumb1.md
> > > @@ -2055,4 +2055,95 @@ (define_insn "thumb1_stack_protect_test_insn"
> > > (set_attr "conds" "clob")
> > > (set_attr "type" "multiple")]
> > >  )
> > > -
> > > +
> > > +;; bad code emitted when HI_REGS involved in addition
> > > +;; subtract also might happen rarely
> > > +
> > > +;; case #1:
> > > +;; mov ip, r1
> > > +;; add r2, ip # ip is dead after that
> > > +(define_peephole2
> > > +  [(set (match_operand:SI 0 "register_operand" "")
> > > +   (match_operand:SI 1 "register_operand" ""))
> > > +   (set (match_operand:SI 2 "register_operand" "")
> > > +   (plus:SI (match_dup 2) (match_dup 0)))]
> > > +  "TARGET_THUMB1
> > > +&& peep2_reg_dead_p (2, operands[0])
> > > +&& REGNO_REG_CLASS (REGNO (operands[0])) == HI_REGS"
> > > +  [(set (match_dup 2)
> > > +   (plus:SI (match_dup 2) (match_dup 1)))]
> > > +  "")
> > > +
> > > +;; case #2:
> > > +;; add ip, r1
> > > +;; mov r1, ip # ip is dead after that
> > > +(define_peephole2
> > > +  [(set (match_operand:SI 0 "register_operand" "")
> > > +   (plus:SI (match_dup 0) (match_operand:SI 1 "register_operand" 
> > > "")))
> > > +   (set (match_dup 1) (match_dup 0))]
> > > +  "TARGET_THUMB1
> > > +&& peep2_reg_dead_p (2, operands[0])
> > > +&& REGNO_REG_CLASS (REGNO (operands[0])) == HI_REGS"
> > > +  [(set (match_dup 1)
> > > +   (plus:SI (match_dup 1) (match_dup 0)))]
> > > +  "")
> > > +
> > > +;; case #3:
> > > +;; mov ip, r1
> > > +;; add r2, ip
> > > +;; add r3, ip # ip is dead after that
> > > +(define_peephole2
> > > +  [(set (match_operand:SI 0 "register_operand" "")
> > > +   (match_operand:SI 1 "register_operand" ""))
> > > +   (set (match_operand:SI 2 "register_operand" "")
> > > +   (plus:SI (match_dup 2) (match_dup 0)))
> > > +   (set (match_operand:SI 3 "register_operand" "")
> > > +   (plus:SI (match_dup 3) (match_dup 0)))]
> > > +  "TARGET_THUMB1
> > > +&& peep2_reg_dead_p (3, operands[0])
> > > +&& REGNO_REG_CLASS (REGNO (operands[0])) == HI_REGS"
> > > +  [(set (match_dup 2)
> > > +   (plus:SI (match_dup 2) (match_dup 1)))
> > > +   (set (match_dup 3)
> > > +   (plus:SI (match_dup 3) (match_dup 1)))]
> > > +  "")
> > > +
> > > +;; case #4:
> > > +;; mov ip, r1
> > > +;; add ip, r2
> > > +;; mov r1, ip
> > > +(define_peephole2
> > > +  [(set (match_operand:SI 0 "register_operand" "")
> > > +   (match_operand:SI 1 "register_operand" ""))
> > > +   (set (match_dup 0)
> > > +   (plus:SI (match_dup 0) (match_operand:SI 2 "register_operand" 
> > > "")))
> > > +   (set (match_dup 1)
> > > +   (match_dup 0))]
> > > +  "TARGET_THUMB1
> > > +&& REGNO_REG_CLASS (REGNO

Re: [PATCH] libstdc++: Test 17_intro/names.cc with -D_FORTIFY_SOURCE=2 [PR116210]

2024-10-04 Thread Jakub Jelinek

On Fri, Oct 04, 2024 at 10:03:36AM -0400, Siddhesh Poyarekar wrote:
> > Add a new testcase that repeats 17_intro/names.cc but with
> > _FORTIFY_SOURCE defined, to find problems in Glibc fortify wrappers like
> > https://sourceware.org/bugzilla/show_bug.cgi?id=32052 (which is fixed
> > now).
> > 
> > libstdc++-v3/ChangeLog:
> > 
> > PR libstdc++/116210
> > * testsuite/17_intro/names.cc (sz): Undef for versions of Glibc
> > that use it in the fortify wrappers.
> > * testsuite/17_intro/names_fortify.cc: New test.
> > ---
> >   libstdc++-v3/testsuite/17_intro/names.cc | 7 +++
> >   libstdc++-v3/testsuite/17_intro/names_fortify.cc | 6 ++
> >   2 files changed, 13 insertions(+)
> >   create mode 100644 libstdc++-v3/testsuite/17_intro/names_fortify.cc
> > 
> > diff --git a/libstdc++-v3/testsuite/17_intro/names.cc 
> > b/libstdc++-v3/testsuite/17_intro/names.cc
> > index 6b9a3639aad..bbf45b93dee 100644
> > --- a/libstdc++-v3/testsuite/17_intro/names.cc
> > +++ b/libstdc++-v3/testsuite/17_intro/names.cc
> > @@ -377,4 +377,11 @@
> >   #undef y
> >   #endif
> > +#if defined __GLIBC_PREREQ && defined _FORTIFY_SOURCE
> > +# if __GLIBC_PREREQ(2,35) && ! __GLIBC_PREREQ(2,41)
> > +// https://sourceware.org/bugzilla/show_bug.cgi?id=32052
> > +#  undef sz
> > +# endif
> > +#endif
> 
> We've backported the fix to stable branches, so the version check isn't
> really that reliable.

That doesn't matter that much.  The worst that happens is that with those
older fixed glibc versions the testing will not test that symbol.
What is more important is that it is checked on the latest glibc,
so when people test gcc with that version, they'll notice if it regresses.

Jakub

Re: [PATCH] ssa-math-opts, i386: Improve spaceship expansion [PR116896]

2024-10-04 Thread Uros Bizjak

On Fri, Oct 4, 2024 at 11:58 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The PR notes that we don't emit optimal code for C++ spaceship
> operator if the result is returned as an integer rather than the
> result just being compared against different values and different
> code executed based on that.
> So e.g. for
> template 
> auto foo (T x, T y) { return x <=> y; }
> for both floating point types, signed integer types and unsigned integer
> types.  auto in that case is std::strong_ordering or std::partial_ordering,
> which are fancy C++ abstractions around struct with signed char member
> which is -1, 0, 1 for the strong ordering and -1, 0, 1, 2 for the partial
> ordering (but for -ffast-math 2 is never the case).
> I'm afraid functions like that are fairly common and unless they are
> inlined, we really need to map the comparison to those -1, 0, 1 or
> -1, 0, 1, 2 values.
>
> Now, for floating point spaceship I've in the past already added an
> optimization (with tree-ssa-math-opts.cc discovery and named optab, the
> optab only defined on x86 though right now), which ensures there is just
> a single comparison instruction and then just tests based on flags.
> Now, if we have code like:
>   auto a = x <=> y;
>   if (a == std::partial_ordering::less)
> bar ();
>   else if (a == std::partial_ordering::greater)
> baz ();
>   else if (a == std::partial_ordering::equivalent)
> qux ();
>   else if (a == std::partial_ordering::unordered)
> corge ();
> etc., that results in decent code generation, the spaceship named pattern
> on x86 optimizes for the jumps, so emits comparisons on the flags, followed
> by setting the result to -1, 0, 1, 2 and subsequent jump pass optimizes that
> well.  But if the result needs to be stored into an integer and just
> returned that way or there are no immediate jumps based on it (or turned
> into some non-standard integer values like -42, 0, 36, 75 etc.), then CE
> doesn't do a good job for that, we end up with say
> comiss  %xmm1, %xmm0
> jp  .L4
> seta%al
> movl$0, %edx
> leal-1(%rax,%rax), %eax
> cmove   %edx, %eax
> ret
> .L4:
> movl$2, %eax
> ret
> The jp is good, that is the unlikely case and can't be easily handled in
> straight line code due to the layout of the flags, but the rest uses cmov
> which often isn't a win and a weird math.
> With the patch below we can get instead
> xorl%eax, %eax
> comiss  %xmm1, %xmm0
> jp  .L2
> seta%al
> sbbl$0, %eax
> ret
> .L2:
> movl$2, %eax
> ret
>
> The patch changes the discovery in the generic code, by detecting if
> the future .SPACESHIP result is just used in a PHI with -1, 0, 1 or
> -1, 0, 1, 2 values (the latter for HONOR_NANS) and passes that as a flag in
> a new argument to .SPACESHIP ifn, so that the named pattern is told whether
> it should optimize for branches or for loading the result into a -1, 0, 1
> (, 2) integer.  Additionally, it doesn't detect just floating point <=>
> anymore, but also integer and unsigned integer, but in those cases only
> if an integer -1, 0, 1 is wanted (otherwise == and > or similar comparisons
> result in good code).
> The backend then can for those integer or unsigned integer <=>s return
> effectively (x > y) - (x < y) in a way that is efficient on the target
> (so for x86 with ensuring zero initialization first when needed before
> setcc; one for floating point and unsigned, where there is just one setcc
> and the second one optimized into sbb instruction, two for the signed int
> case).  So e.g. for signed int we now emit
> xorl%edx, %edx
> xorl%eax, %eax
> cmpl%esi, %edi
> setl%dl
> setg%al
> subl%edx, %eax
> ret
> and for unsigned
> xorl%eax, %eax
> cmpl%esi, %edi
> seta%al
> sbbb$0, %al
> ret
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Note, I wonder if other targets wouldn't benefit from defining the
> named optab too...
>
> 2024-10-04  Jakub Jelinek  
>
> PR middle-end/116896
> * optabs.def (spaceship_optab): Use spaceship$a4 rather than
> spaceship$a3.
> * internal-fn.cc (expand_SPACESHIP): Expect 3 call arguments
> rather than 2, expand the last one, expect 4 operands of
> spaceship_optab.
> * tree-ssa-math-opts.cc: Include cfghooks.h.
> (optimize_spaceship): Check if a single PHI is initialized to
> -1, 0, 1, 2 or -1, 0, 1 values, in that case pass 1 as last (new)
> argument to .SPACESHIP and optimize away the comparisons,
> otherwise pass 0.  Also check for integer comparisons rather than
> floating point, in that case do it only if there is a single PHI
> with -1, 0, 1 values and pass 1 to last argument of .SPACESHIP
> if t

Re: [PATCH] libstdc++: Test 17_intro/names.cc with -D_FORTIFY_SOURCE=2 [PR116210]

2024-10-04 Thread Jonathan Wakely

On Fri, 4 Oct 2024 at 15:05, Siddhesh Poyarekar  wrote:
>
> On 2024-10-04 07:52, Jonathan Wakely wrote:
> > This doesn't really belong in our testsuite, because the sole purpose of
> > the new test is to find bugs in the Glibc wrappers (like the one linked
> > below). But maybe it's a kindness to do it in our testsuite, because we
> > already have this test in place, and one Glibc bug was already found
> > thanks to Sam running the existing test with _FORTIFY_SOURCE defined.
> >
> > Should we do this?
> >
> > -- >8 --
> >
> > Add a new testcase that repeats 17_intro/names.cc but with
> > _FORTIFY_SOURCE defined, to find problems in Glibc fortify wrappers like
> > https://sourceware.org/bugzilla/show_bug.cgi?id=32052 (which is fixed
> > now).
> >
> > libstdc++-v3/ChangeLog:
> >
> >   PR libstdc++/116210
> >   * testsuite/17_intro/names.cc (sz): Undef for versions of Glibc
> >   that use it in the fortify wrappers.
> >   * testsuite/17_intro/names_fortify.cc: New test.
> > ---
> >   libstdc++-v3/testsuite/17_intro/names.cc | 7 +++
> >   libstdc++-v3/testsuite/17_intro/names_fortify.cc | 6 ++
> >   2 files changed, 13 insertions(+)
> >   create mode 100644 libstdc++-v3/testsuite/17_intro/names_fortify.cc
> >
> > diff --git a/libstdc++-v3/testsuite/17_intro/names.cc 
> > b/libstdc++-v3/testsuite/17_intro/names.cc
> > index 6b9a3639aad..bbf45b93dee 100644
> > --- a/libstdc++-v3/testsuite/17_intro/names.cc
> > +++ b/libstdc++-v3/testsuite/17_intro/names.cc
> > @@ -377,4 +377,11 @@
> >   #undef y
> >   #endif
> >
> > +#if defined __GLIBC_PREREQ && defined _FORTIFY_SOURCE
> > +# if __GLIBC_PREREQ(2,35) && ! __GLIBC_PREREQ(2,41)
> > +// https://sourceware.org/bugzilla/show_bug.cgi?id=32052
> > +#  undef sz
> > +# endif
> > +#endif
>
> We've backported the fix to stable branches, so the version check isn't
> really that reliable.


Yeah, but it doesn't matter if we #undef sz on some Glibc systems that
don't actually have the bug.

Re: [patch,avr] Implement TARGET_FLOATN_MODE

2024-10-04 Thread Jakub Jelinek

On Fri, Oct 04, 2024 at 08:09:48AM -0600, Jeff Law wrote:
> 
> 
> On 10/4/24 7:46 AM, Georg-Johann Lay wrote:
> > This patch implements TARGET_FLOATN_MODE which maps
> > _Float32[x] to SFmode and _Float64[x] to DFmode.
> > 
> > There is currently no library support for extended float types,
> > but these settings are more reasonable for avr (and they make
> > more tests pass).
> > 
> > Ok for trunk?
> > 
> > Johann
> > 
> > -- 
> > 
> > AVR: Implement TARGET_FLOATN_MODE.
> > 
> > gcc/
> >  * config/avr/avr.cc (avr_floatn_mode): New static function.
> >  (TARGET_FLOATN_MODE): New define.
> OK

This is certainly incorrect.

As specified by e.g. ISO C23 H.2.3 Extended floating types, the requirement
on the extended floating types is:
"For each of its basic formats, IEC 60559 specifies an extended format whose 
maximum exponent and
precision exceed those of the basic format it is associated with. Extended 
formats are intended for
arithmetic with more precision and exponent range than is available in the 
basic formats used for
the input data."
So, while SFmode is a good mode to use for _Float32 and DFmode is a good
mode to use for _Float64, SFmode isn't a good mode to use for _Float32x and
neither is DFmode a good mode to use for _Float64x.
I'd expect you want DFmode for _Float32x and opt_scalar_float_mode () for
_Float64x.

Jakub

Re: [PATCH] testsuite: Fix fallout of turning warnings into errors on 32-bit Arm

2024-10-04 Thread Thiago Jung Bauermann

Hello Christophe,

Christophe Lyon  writes:

> On Fri, 1 Mar 2024 at 15:29, Richard Earnshaw (lists)
>  wrote:
>>
>> On 01/03/2024 14:23, Andre Vieira (lists) wrote:
>> > Hi Thiago,
>> >
>> > Thanks for this, LGTM but I can't approve this, CC'ing Richard.
>> >
>> > Do have a nitpick, in the gcc/testsuite/ChangeLog: remove 'gcc/testsuite' 
>> > from bullet
>> > points 2-4.
>> >
>>
>> Yes, this is OK with the change Andre mentioned (your push will fail if you 
>> don't fix
>> that).
>>
>> PS, if you've set up GCC git customizations (see 
>> contrib/gcc-git-customization.sh), you
>> can verify things like this with 'git gcc-verify HEAD^..HEAD'
>>
>
> ISTM you have forgotten to commit this patch.
> If you don't have commit rights, I can do it for you.

That is true, sorry about that. I just pushed the patch as commit
115857bf1e32.

It incorporates Andre's ChangeLog fix and git gcc-verify says the commit
is OK.

Thank you for reminding me.

-- 
Thiago

Re: [PATCH] aarch64: Fix bug with max/min (PR116934)

2024-10-04 Thread Wilco Dijkstra

Hi Saurabh,

This looks good, one little nit:

> gcc/ChangeLog:
>
>     * config/aarch64/iterators.md: Move UNSPEC_COND_SMAX and
>     UNSPEC_COND_SMIN to correct iterators.

This should also have the PR target/116934 before it - it's fine to change it 
when you commit.

Speaking of which, can we try getting this committed before the weekend so the
benchmark runs will work again?

Cheers,
Wilco

[patch,testsuite] Some float64 and float32x test require double64plus.

2024-10-04 Thread Georg-Johann Lay


Some of the float64 and float32x test cases are using double built-ins
and hence require double64plus resp. double_float32xplus, i.e. double
is at least as good as float32x.

This patch adds according dg-require-effective-target filters.
(But only for test cases where I can verify that they are working
with double64+ but are failing with double32.)

Ok for trunk?

Johann

--

testsuite - Some float64 and float32x test require double64plus.

Some of the float64 and float32x test cases are using double built-ins
and hence require double64plus resp. that double is at least as good
as float32x (double_float32xplus).

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_double_float32xplus):
New proc.
* gcc.dg/torture/float32x-builtin.c: Add
dg-require-effective-target double_float32xplus.
* gcc.dg/torture/float32x-tg-2.c: Same.
* gcc.dg/torture/float32x-tg.c: Same.
* gcc.dg/torture/float64-builtin.c: Add
dg-require-effective-target double64plus.
* gcc.dg/torture/float64-tg-2.c: Same.
* gcc.dg/torture/float64-tg.c: Same.testsuite - Some float64 and float32x test require double64plus.

Some of the float64 and float32x test cases are using double built-ins
and hence require double64plus resp. that double is at least as good
as float32x (double_float32xplus).

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_double_float32xplus):
New proc.
* gcc.dg/torture/float32x-builtin.c: Add
dg-require-effective-target double_float32xplus.
* gcc.dg/torture/float32x-tg-2.c: Same.
* gcc.dg/torture/float32x-tg.c: Same.
* gcc.dg/torture/float64-builtin.c: Add
dg-require-effective-target double64plus.
* gcc.dg/torture/float64-tg-2.c: Same.
* gcc.dg/torture/float64-tg.c: Same.

diff --git a/gcc/testsuite/gcc.dg/torture/float32x-builtin.c b/gcc/testsuite/gcc.dg/torture/float32x-builtin.c
index 71eb7e2cdc8..0404d392705 100644
--- a/gcc/testsuite/gcc.dg/torture/float32x-builtin.c
+++ b/gcc/testsuite/gcc.dg/torture/float32x-builtin.c
@@ -4,6 +4,7 @@
 /* { dg-add-options float32x } */
 /* { dg-add-options ieee } */
 /* { dg-require-effective-target float32x_runtime } */
+/* { dg-require-effective-target double_float32xplus } */
 
 #define WIDTH 32
 #define EXT 1
diff --git a/gcc/testsuite/gcc.dg/torture/float32x-tg-2.c b/gcc/testsuite/gcc.dg/torture/float32x-tg-2.c
index 6179aba7cdd..dd7e2064a1a 100644
--- a/gcc/testsuite/gcc.dg/torture/float32x-tg-2.c
+++ b/gcc/testsuite/gcc.dg/torture/float32x-tg-2.c
@@ -4,6 +4,7 @@
 /* { dg-add-options float32x } */
 /* { dg-add-options ieee } */
 /* { dg-require-effective-target float32x_runtime } */
+/* { dg-require-effective-target double_float32xplus } */
 
 #define WIDTH 32
 #define EXT 1
diff --git a/gcc/testsuite/gcc.dg/torture/float32x-tg.c b/gcc/testsuite/gcc.dg/torture/float32x-tg.c
index b65b03f558b..87d9bef2b03 100644
--- a/gcc/testsuite/gcc.dg/torture/float32x-tg.c
+++ b/gcc/testsuite/gcc.dg/torture/float32x-tg.c
@@ -4,6 +4,7 @@
 /* { dg-add-options float32x } */
 /* { dg-add-options ieee } */
 /* { dg-require-effective-target float32x_runtime } */
+/* { dg-require-effective-target double_float32xplus } */
 
 #define WIDTH 32
 #define EXT 1
diff --git a/gcc/testsuite/gcc.dg/torture/float64-builtin.c b/gcc/testsuite/gcc.dg/torture/float64-builtin.c
index 413768443ae..2462017e4d5 100644
--- a/gcc/testsuite/gcc.dg/torture/float64-builtin.c
+++ b/gcc/testsuite/gcc.dg/torture/float64-builtin.c
@@ -4,6 +4,7 @@
 /* { dg-add-options float64 } */
 /* { dg-add-options ieee } */
 /* { dg-require-effective-target float64_runtime } */
+/* { dg-require-effective-target double64plus } */
 
 #define WIDTH 64
 #define EXT 0
diff --git a/gcc/testsuite/gcc.dg/torture/float64-tg-2.c b/gcc/testsuite/gcc.dg/torture/float64-tg-2.c
index d0e4316611f..f034e76cfeb 100644
--- a/gcc/testsuite/gcc.dg/torture/float64-tg-2.c
+++ b/gcc/testsuite/gcc.dg/torture/float64-tg-2.c
@@ -4,6 +4,7 @@
 /* { dg-add-options float64 } */
 /* { dg-add-options ieee } */
 /* { dg-require-effective-target float64_runtime } */
+/* { dg-require-effective-target double64plus } */
 
 #define WIDTH 64
 #define EXT 0
diff --git a/gcc/testsuite/gcc.dg/torture/float64-tg.c b/gcc/testsuite/gcc.dg/torture/float64-tg.c
index a7188312d57..d17ee0ecb19 100644
--- a/gcc/testsuite/gcc.dg/torture/float64-tg.c
+++ b/gcc/testsuite/gcc.dg/torture/float64-tg.c
@@ -4,6 +4,7 @@
 /* { dg-add-options float64 } */
 /* { dg-add-options ieee } */
 /* { dg-require-effective-target float64_runtime } */
+/* { dg-require-effective-target double64plus } */
 
 #define WIDTH 64
 #define EXT 0
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index f92f7f1af9c..459af8e58c6 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3965,6 +3

[PATCH 0/8] aarch64: Add new flags for existing features

2024-10-04 Thread Andrew Carlotti

This patch series adds 7 new flags for features that were previously available
in GCC only as part of an architecture version.  It also fixes one other
instance where an architecture version was used in a check instead of a feature
flag.

Bootstrapped and regression tested as a whole on aarch64.  I additionally ran
the cpunative tests after each patch in the series.  Ok for master?

Re: [PATCH] libstdc++: Implement LWG 3664 changes to ranges::distance

2024-10-04 Thread Jonathan Wakely

On Fri, 4 Oct 2024 at 19:37, Patrick Palka  wrote:
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk/backports?

OK for all branches (assuming we already have LWG 3392 on the branches).


>
> -- >8 --
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/ranges_base.h (__distance_fn::operator()):
> Adjust iterator/sentinel overloads as per LWG 3664.
> * testsuite/24_iterators/range_operations/distance.cc:
> Test LWG 3664 example.
> ---
>  libstdc++-v3/include/bits/ranges_base.h| 14 +++---
>  .../24_iterators/range_operations/distance.cc  | 11 +++
>  2 files changed, 18 insertions(+), 7 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/ranges_base.h 
> b/libstdc++-v3/include/bits/ranges_base.h
> index 137c3c98e14..cb2eba1f841 100644
> --- a/libstdc++-v3/include/bits/ranges_base.h
> +++ b/libstdc++-v3/include/bits/ranges_base.h
> @@ -947,7 +947,9 @@ namespace ranges
>
>struct __distance_fn final
>{
> -template _Sent>
> +// _GLIBCXX_RESOLVE_LIB_DEFECTS
> +// 3664. LWG 3392 broke std::ranges::distance(a, a+3)
> +template _Sent>
>requires (!sized_sentinel_for<_Sent, _It>)
>constexpr iter_difference_t<_It>
>operator()[[nodiscard]](_It __first, _Sent __last) const
> @@ -961,13 +963,11 @@ namespace ranges
> return __n;
>}
>
> -template _Sent>
> +template> _Sent>
>[[nodiscard]]
> -  constexpr iter_difference_t<_It>
> -  operator()(const _It& __first, const _Sent& __last) const
> -  {
> -   return __last - __first;
> -  }
> +  constexpr iter_difference_t>
> +  operator()(_It&& __first, _Sent __last) const
> +  { return __last - static_cast&>(__first); }
>
>  template
>[[nodiscard]]
> diff --git a/libstdc++-v3/testsuite/24_iterators/range_operations/distance.cc 
> b/libstdc++-v3/testsuite/24_iterators/range_operations/distance.cc
> index 9a1d0c3efe8..336956936c2 100644
> --- a/libstdc++-v3/testsuite/24_iterators/range_operations/distance.cc
> +++ b/libstdc++-v3/testsuite/24_iterators/range_operations/distance.cc
> @@ -144,6 +144,16 @@ test05()
>VERIFY( std::ranges::distance(c4) == 5 );
>  }
>
> +void
> +test06()
> +{
> +  // LWG 3664 - LWG 3392 broke std::ranges::distance(a, a+3)
> +  int a[] = {1, 2, 3};
> +  VERIFY( std::ranges::distance(a, a+3) == 3 );
> +  VERIFY( std::ranges::distance(a, a) == 0 );
> +  VERIFY( std::ranges::distance(a+3, a) == -3 );
> +}
> +
>  int
>  main()
>  {
> @@ -152,4 +162,5 @@ main()
>test03();
>test04();
>test05();
> +  test06();
>  }
> --
> 2.47.0.rc1
>

Re: [PATCH 0/8] aarch64: Add new flags for existing features

2024-10-04 Thread Andrew Pinski

On Fri, Oct 4, 2024 at 10:51 AM Andrew Carlotti  wrote:
>
> This patch series adds 7 new flags for features that were previously available
> in GCC only as part of an architecture version.  It also fixes one other
> instance where an architecture version was used in a check instead of a 
> feature
> flag.
>
> Bootstrapped and regression tested as a whole on aarch64.  I additionally ran
> the cpunative tests after each patch in the series.  Ok for master?

I think this is good except there is no modification of the documentation.
Yes the feature flags are documented; see
https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#g_t-march-and--mcpu-Feature-Modifiers
.

Thanks,
Andrew

[COMMITTED] MAINTAINERS: Add myself to write after approval

2024-10-04 Thread Thiago Jung Bauermann

ChangeLog:
* MAINTAINERS: Add myself to write after approval.
---

Hello,

I just noticed that I wasn't yet in the write after approval section,
so I just committed this patch.

 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index ded5b3d4f643..9257b33ff089 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -344,6 +344,7 @@ Richard Ballricbal02

 Scott Bambrough -   
 Wolfgang Bangerth   -   
 Gergö Barany-   
+Thiago Jung Bauermann   -   
 Charles Baylis  cbaylis 
 Tejas Belagod   belagod 
 Andrey Belevantsev  abel

base-commit: 385a232229a5b4ee3f4d2a2472bcda28cd8d17b2

[PATCH 6/8] aarch64: Add new +rcpc2 flag

2024-10-04 Thread Andrew Carlotti

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V8_4A): Add RCPC2.
* config/aarch64/aarch64-option-extensions.def
(RCPC2): New flag.
(RCPC3): Add RCPC2 dependency.
* config/aarch64/aarch64.h (TARGET_RCPC2): Use new flag.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/native_cpu_21.c: Add rcpc2 to
expected feature string instead of rcpc.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.


diff --git a/gcc/config/aarch64/aarch64-arches.def 
b/gcc/config/aarch64/aarch64-arches.def
index 
84782d55089650b5854c60497bc68f9564d6f90b..f182d3dc6c77bf63ab272ab1b5824c1523390e09
 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -34,7 +34,7 @@ AARCH64_ARCH("armv8-a",   generic_armv8_a,   V8A,   
8,  (SIMD))
 AARCH64_ARCH("armv8.1-a", generic_armv8_a,   V8_1A, 8,  (V8A, LSE, 
CRC, RDMA))
 AARCH64_ARCH("armv8.2-a", generic_armv8_a,   V8_2A, 8,  (V8_1A))
 AARCH64_ARCH("armv8.3-a", generic_armv8_a,   V8_3A, 8,  (V8_2A, PAUTH, 
RCPC, FCMA, JSCVT))
-AARCH64_ARCH("armv8.4-a", generic_armv8_a,   V8_4A, 8,  (V8_3A, 
F16FML, DOTPROD, FLAGM))
+AARCH64_ARCH("armv8.4-a", generic_armv8_a,   V8_4A, 8,  (V8_3A, 
F16FML, DOTPROD, FLAGM, RCPC2))
 AARCH64_ARCH("armv8.5-a", generic_armv8_a,   V8_5A, 8,  (V8_4A, SB, 
SSBS, PREDRES, FRINTTS, FLAGM2))
 AARCH64_ARCH("armv8.6-a", generic_armv8_a,   V8_6A, 8,  (V8_5A, I8MM, 
BF16))
 AARCH64_ARCH("armv8.7-a", generic_armv8_a,   V8_7A, 8,  (V8_6A))
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 
b73324abbeb6145b5a2c26fdb22f41de9b6045d9..b929773eba176a391d6e9242067e4f63e4434637
 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -159,7 +159,9 @@ AARCH64_OPT_FMV_EXTENSION("fcma", FCMA, (SIMD), (), (), 
"fcma")
 
 AARCH64_OPT_FMV_EXTENSION("rcpc", RCPC, (), (), (), "lrcpc")
 
-AARCH64_OPT_FMV_EXTENSION("rcpc3", RCPC3, (RCPC), (), (), "lrcpc3")
+AARCH64_OPT_FMV_EXTENSION("rcpc2", RCPC2, (RCPC), (), (), "ilrcpc")
+
+AARCH64_OPT_FMV_EXTENSION("rcpc3", RCPC3, (RCPC2), (), (), "lrcpc3")
 
 AARCH64_OPT_FMV_EXTENSION("frintts", FRINTTS, (FP), (), (), "frint")
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
41430466b50bf223bf008c753d24f57570c1f2e5..3ed1930d3e4ac9f250219a43aa91cb8ed123f53c
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -427,7 +427,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED
 
 /* The RCPC2 extensions from Armv8.4-a that allow immediate offsets to LDAPR
and sign-extending versions.*/
-#define TARGET_RCPC2 ((AARCH64_HAVE_ISA (V8_4A) && TARGET_RCPC) || 
TARGET_RCPC3)
+#define TARGET_RCPC2 AARCH64_HAVE_ISA (RCPC2)
 
 /* RCPC3 (Release Consistency) extensions, optional from Armv8.2-a.  */
 #define TARGET_RCPC3 AARCH64_HAVE_ISA (RCPC3)
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c
index 
c1d5896e1eb0b3b48ac0c1eeb95a74c4b6ec9e85..904cdf452263961442f3ecc31cd1b6563130f9c7
 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c
@@ -7,7 +7,7 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch 
armv8-a\+flagm2\+lse\+dotprod\+rdma\+crc\+fp16fml\+jscvt\+rcpc\+frintts\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+sb\+ssbs\n}
 } } */
+/* { dg-final { scan-assembler {\.arch 
armv8-a\+flagm2\+lse\+dotprod\+rdma\+crc\+fp16fml\+jscvt\+rcpc2\+frintts\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+sb\+ssbs\n}
 } } */
 
 /* Check that an Armv8-A core doesn't fall apart on extensions without midr
values.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c
index 
4533a2bf5912dc609327b63164ba4577e98f9eec..feb959b11b0e383a5e1f3214d55f80f56d2605d4
 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c
@@ -7,7 +7,7 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch 
armv8-a\+flagm2\+lse\+dotprod\+rdma\+crc\+fp16fml\+jscvt\+rcpc\+frintts\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+sb\+ssbs\+pauth\n}
 } } */
+/* { dg-final { scan-assembler {\.arch 
armv8-a\+flagm2\+lse\+dotprod\+rdma\+crc\+fp16fml\+jscvt\+rcpc2\+frintts\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+sb\+ssbs\+pauth\n}
 } } */
 
 /* Check that an Armv8-A core doesn't fall apart on extensions without midr
values and that it enables optional features.  */

[PATCH 8/8] aarch64: Add new +xs flag

2024-10-04 Thread Andrew Carlotti

GCC does not emit tlbi instructions, so this only affects the flags
passed through to the assembler.

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V8_7A): Add XS.
* config/aarch64/aarch64-option-extensions.def (XS): New flag.


diff --git a/gcc/config/aarch64/aarch64-arches.def 
b/gcc/config/aarch64/aarch64-arches.def
index 
fa06377dda089c8a89628bc4cc66d54510346053..66fe5cef0896847715d3b0a404ebabedfc82f34d
 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -37,7 +37,7 @@ AARCH64_ARCH("armv8.3-a", generic_armv8_a,   V8_3A, 
8,  (V8_2A, PAUTH, R
 AARCH64_ARCH("armv8.4-a", generic_armv8_a,   V8_4A, 8,  (V8_3A, 
F16FML, DOTPROD, FLAGM, RCPC2))
 AARCH64_ARCH("armv8.5-a", generic_armv8_a,   V8_5A, 8,  (V8_4A, SB, 
SSBS, PREDRES, FRINTTS, FLAGM2))
 AARCH64_ARCH("armv8.6-a", generic_armv8_a,   V8_6A, 8,  (V8_5A, I8MM, 
BF16))
-AARCH64_ARCH("armv8.7-a", generic_armv8_a,   V8_7A, 8,  (V8_6A, WFXT))
+AARCH64_ARCH("armv8.7-a", generic_armv8_a,   V8_7A, 8,  (V8_6A, WFXT, 
XS))
 AARCH64_ARCH("armv8.8-a", generic_armv8_a,   V8_8A, 8,  (V8_7A, MOPS))
 AARCH64_ARCH("armv8.9-a", generic_armv8_a,   V8_9A, 8,  (V8_8A, CSSC))
 AARCH64_ARCH("armv8-r",   generic_armv8_a,   V8R  , 8,  (V8_4A))
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 
9781d48f63778d186b66427bae7deb2c01e14107..93adb556276c2379f50805d40d891229c87e1783
 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -222,6 +222,8 @@ AARCH64_OPT_EXTENSION("ls64", LS64, (), (), (), "")
 
 AARCH64_OPT_FMV_EXTENSION("wfxt", WFXT, (), (), (), "wfxt")
 
+AARCH64_OPT_EXTENSION("xs", XS, (), (), (), "")
+
 AARCH64_OPT_EXTENSION("sme-f64f64", SME_F64F64, (SME), (), (), "")
 
 AARCH64_FMV_FEATURE("sme-f64f64", SME_F64, (SME_F64F64))

[PATCH 1/8] aarch64: Use PAUTH instead of V8_3A in some places

2024-10-04 Thread Andrew Carlotti

gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_expand_epilogue): Use TARGET_PAUTH.
* config/aarch64/aarch64.md: Update comment.


diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
e7bb3278a27eca44c46afd26069d608218198a54..cf1107127fd5d9e12ad42441528666bf6b733f73
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -10042,12 +10042,12 @@ aarch64_expand_epilogue (rtx_call_insn *sibcall)
1) Sibcalls don't return in a normal way, so if we're about to call one
   we must authenticate.
 
-   2) The RETAA instruction is not available before ARMv8.3-A, so if we are
-  generating code for !TARGET_ARMV8_3 we can't use it and must
+   2) The RETAA instruction is not available without FEAT_PAuth, so if we
+  are generating code for !TARGET_PAUTH we can't use it and must
   explicitly authenticate.
 */
   if (aarch64_return_address_signing_enabled ()
-  && (sibcall || !TARGET_ARMV8_3))
+  && (sibcall || !TARGET_PAUTH))
 {
   switch (aarch64_ra_sign_key)
{
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
c54b29cd64b9e0dc6c6d12735049386ccedc5408..0940a84f9295ee2bc07282b150095fdb5af11a4d
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7672,10 +7672,10 @@
 )
 
 ;; Pointer authentication patterns are always provided.  In architecture
-;; revisions prior to ARMv8.3-A these HINT instructions operate as NOPs.
+;; revisions prior to FEAT_PAuth these HINT instructions operate as NOPs.
 ;; This lets the user write portable software which authenticates pointers
-;; when run on something which implements ARMv8.3-A, and which runs
-;; correctly, but does not authenticate pointers, where ARMv8.3-A is not
+;; when run on something which implements FEAT_PAuth, and which runs
+;; correctly, but does not authenticate pointers, where FEAT_PAuth is not
 ;; implemented.
 
 ;; Signing/Authenticating R30 using SP as the salt.

[PATCH 5/8] aarch64: Add new +flagm2 flag

2024-10-04 Thread Andrew Carlotti

GCC does not currently emit the axflag or xaflag instructions, so this
primarily affects the flags passed through to the assembler.

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V8_5A): Add FLAGM2.
* config/aarch64/aarch64-option-extensions.def (FLAGM2): New flag.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/native_cpu_21.c: Add flagm2 to
expected feature string instead of flagm.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.


diff --git a/gcc/config/aarch64/aarch64-arches.def 
b/gcc/config/aarch64/aarch64-arches.def
index 
668e7833bd81a7d8795df022f205ca7ca0d0ddef..84782d55089650b5854c60497bc68f9564d6f90b
 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -35,7 +35,7 @@ AARCH64_ARCH("armv8.1-a", generic_armv8_a,   V8_1A, 
8,  (V8A, LSE, CRC,
 AARCH64_ARCH("armv8.2-a", generic_armv8_a,   V8_2A, 8,  (V8_1A))
 AARCH64_ARCH("armv8.3-a", generic_armv8_a,   V8_3A, 8,  (V8_2A, PAUTH, 
RCPC, FCMA, JSCVT))
 AARCH64_ARCH("armv8.4-a", generic_armv8_a,   V8_4A, 8,  (V8_3A, 
F16FML, DOTPROD, FLAGM))
-AARCH64_ARCH("armv8.5-a", generic_armv8_a,   V8_5A, 8,  (V8_4A, SB, 
SSBS, PREDRES, FRINTTS))
+AARCH64_ARCH("armv8.5-a", generic_armv8_a,   V8_5A, 8,  (V8_4A, SB, 
SSBS, PREDRES, FRINTTS, FLAGM2))
 AARCH64_ARCH("armv8.6-a", generic_armv8_a,   V8_6A, 8,  (V8_5A, I8MM, 
BF16))
 AARCH64_ARCH("armv8.7-a", generic_armv8_a,   V8_7A, 8,  (V8_6A))
 AARCH64_ARCH("armv8.8-a", generic_armv8_a,   V8_8A, 8,  (V8_7A, MOPS))
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 
505f1fb721c64e4b55b52baf465024a57c68ab98..b73324abbeb6145b5a2c26fdb22f41de9b6045d9
 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -103,6 +103,8 @@ AARCH64_OPT_FMV_EXTENSION("rng", RNG, (), (), (), "rng")
 
 AARCH64_OPT_FMV_EXTENSION("flagm", FLAGM, (), (), (), "flagm")
 
+AARCH64_OPT_FMV_EXTENSION("flagm2", FLAGM2, (FLAGM), (), (), "flagm2")
+
 AARCH64_OPT_FMV_EXTENSION("lse", LSE, (), (), (), "atomics")
 
 AARCH64_OPT_FMV_EXTENSION("fp", FP, (), (), (), "fp")
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c
index 
aa70d1d22b8299befcd81a696f051eb72997d548..c1d5896e1eb0b3b48ac0c1eeb95a74c4b6ec9e85
 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c
@@ -7,7 +7,7 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch 
armv8-a\+flagm\+lse\+dotprod\+rdma\+crc\+fp16fml\+jscvt\+rcpc\+frintts\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+sb\+ssbs\n}
 } } */
+/* { dg-final { scan-assembler {\.arch 
armv8-a\+flagm2\+lse\+dotprod\+rdma\+crc\+fp16fml\+jscvt\+rcpc\+frintts\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+sb\+ssbs\n}
 } } */
 
 /* Check that an Armv8-A core doesn't fall apart on extensions without midr
values.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c
index 
ccd5d0d9bb7d7bf722bcffcc14c46d88d3223cf3..4533a2bf5912dc609327b63164ba4577e98f9eec
 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c
@@ -7,7 +7,7 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch 
armv8-a\+flagm\+lse\+dotprod\+rdma\+crc\+fp16fml\+jscvt\+rcpc\+frintts\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+sb\+ssbs\+pauth\n}
 } } */
+/* { dg-final { scan-assembler {\.arch 
armv8-a\+flagm2\+lse\+dotprod\+rdma\+crc\+fp16fml\+jscvt\+rcpc\+frintts\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+sb\+ssbs\+pauth\n}
 } } */
 
 /* Check that an Armv8-A core doesn't fall apart on extensions without midr
values and that it enables optional features.  */

[PATCH 7/8] aarch64: Add new +wfxt flag

2024-10-04 Thread Andrew Carlotti

GCC does not currently emit the wfet or wfit instructions, so this
primarily affects the flags passed through to the assembler.

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V8_7A): Add WFXT.
* config/aarch64/aarch64-option-extensions.def (WFXT): New flag.


diff --git a/gcc/config/aarch64/aarch64-arches.def 
b/gcc/config/aarch64/aarch64-arches.def
index 
f182d3dc6c77bf63ab272ab1b5824c1523390e09..fa06377dda089c8a89628bc4cc66d54510346053
 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -37,7 +37,7 @@ AARCH64_ARCH("armv8.3-a", generic_armv8_a,   V8_3A, 
8,  (V8_2A, PAUTH, R
 AARCH64_ARCH("armv8.4-a", generic_armv8_a,   V8_4A, 8,  (V8_3A, 
F16FML, DOTPROD, FLAGM, RCPC2))
 AARCH64_ARCH("armv8.5-a", generic_armv8_a,   V8_5A, 8,  (V8_4A, SB, 
SSBS, PREDRES, FRINTTS, FLAGM2))
 AARCH64_ARCH("armv8.6-a", generic_armv8_a,   V8_6A, 8,  (V8_5A, I8MM, 
BF16))
-AARCH64_ARCH("armv8.7-a", generic_armv8_a,   V8_7A, 8,  (V8_6A))
+AARCH64_ARCH("armv8.7-a", generic_armv8_a,   V8_7A, 8,  (V8_6A, WFXT))
 AARCH64_ARCH("armv8.8-a", generic_armv8_a,   V8_8A, 8,  (V8_7A, MOPS))
 AARCH64_ARCH("armv8.9-a", generic_armv8_a,   V8_9A, 8,  (V8_8A, CSSC))
 AARCH64_ARCH("armv8-r",   generic_armv8_a,   V8R  , 8,  (V8_4A))
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 
b929773eba176a391d6e9242067e4f63e4434637..9781d48f63778d186b66427bae7deb2c01e14107
 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -220,6 +220,8 @@ AARCH64_OPT_EXTENSION("pauth", PAUTH, (), (), (), "paca 
pacg")
 
 AARCH64_OPT_EXTENSION("ls64", LS64, (), (), (), "")
 
+AARCH64_OPT_FMV_EXTENSION("wfxt", WFXT, (), (), (), "wfxt")
+
 AARCH64_OPT_EXTENSION("sme-f64f64", SME_F64F64, (SME), (), (), "")
 
 AARCH64_FMV_FEATURE("sme-f64f64", SME_F64, (SME_F64F64))

Re: [PATCH 1/2] aarch64: Split FCMA feature bit from Armv8.3-A

2024-10-04 Thread Andrew Carlotti

On Wed, Oct 02, 2024 at 06:13:38PM +0100, Andre Vieira wrote:
> 
> This patch splits out FCMA as a feature from Armv8.3-A and adds it as a 
> separate
> feature bit which now controls 'TARGET_COMPLEX'.
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-arches.def (FCMA): New feature bit, can not be
>   used as an extension in the command-line.
>   * config/aarch64/aarch64.h (TARGET_COMPLEX): Use FCMA feature bit
>   rather than ARMV8_3.
> ---
>  gcc/config/aarch64/aarch64-arches.def| 2 +-
>  gcc/config/aarch64/aarch64-option-extensions.def | 1 +
>  gcc/config/aarch64/aarch64.h | 2 +-
>  3 files changed, 3 insertions(+), 2 deletions(-)
> 

> diff --git a/gcc/config/aarch64/aarch64-arches.def 
> b/gcc/config/aarch64/aarch64-arches.def
> index 4634b272e28..fadf9c36b03 100644
> --- a/gcc/config/aarch64/aarch64-arches.def
> +++ b/gcc/config/aarch64/aarch64-arches.def
> @@ -33,7 +33,7 @@
>  AARCH64_ARCH("armv8-a",   generic_armv8_a,   V8A,   8,  (SIMD))
>  AARCH64_ARCH("armv8.1-a", generic_armv8_a,   V8_1A, 8,  (V8A, LSE, 
> CRC, RDMA))
>  AARCH64_ARCH("armv8.2-a", generic_armv8_a,   V8_2A, 8,  (V8_1A))
> -AARCH64_ARCH("armv8.3-a", generic_armv8_a,   V8_3A, 8,  (V8_2A, 
> PAUTH, RCPC))
> +AARCH64_ARCH("armv8.3-a", generic_armv8_a,   V8_3A, 8,  (V8_2A, 
> PAUTH, RCPC, FCMA))
>  AARCH64_ARCH("armv8.4-a", generic_armv8_a,   V8_4A, 8,  (V8_3A, 
> F16FML, DOTPROD, FLAGM))
>  AARCH64_ARCH("armv8.5-a", generic_armv8_a,   V8_5A, 8,  (V8_4A, SB, 
> SSBS, PREDRES))
>  AARCH64_ARCH("armv8.6-a", generic_armv8_a,   V8_6A, 8,  (V8_5A, 
> I8MM, BF16))
> diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
> b/gcc/config/aarch64/aarch64-option-extensions.def
> index 6998627f377..4732c20ec96 100644
> --- a/gcc/config/aarch64/aarch64-option-extensions.def
> +++ b/gcc/config/aarch64/aarch64-option-extensions.def
> @@ -193,6 +193,7 @@ AARCH64_OPT_EXTENSION("sve2-sm4", SVE2_SM4, (SVE2, SM4), 
> (), (), "svesm4")
>  AARCH64_FMV_FEATURE("sve2-sm4", SVE_SM4, (SVE2_SM4))
>  
>  AARCH64_OPT_FMV_EXTENSION("sme", SME, (BF16, SVE2), (), (), "sme")
> +AARCH64_OPT_EXTENSION("", FCMA, (), (), (), "fcma")
>  
>  AARCH64_OPT_EXTENSION("memtag", MEMTAG, (), (), (), "")
>  
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index a99e7bb6c47..c0ad305e324 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -362,7 +362,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE 
> ATTRIBUTE_UNUSED
>  #define TARGET_JSCVT (TARGET_FLOAT && TARGET_ARMV8_3)
>  
>  /* Armv8.3-a Complex number extension to AdvSIMD extensions.  */
> -#define TARGET_COMPLEX (TARGET_SIMD && TARGET_ARMV8_3)
> +#define TARGET_COMPLEX (TARGET_SIMD && AARCH64_HAVE_ISA (FCMA))
>  
>  /* Floating-point rounding instructions from Armv8.5-a.  */
>  #define TARGET_FRINT (AARCH64_HAVE_ISA (V8_5A) && TARGET_FLOAT)

This patch doesn't work (as I know you're already aware).  I've posted a more
complete patch to split out FCMA, which can replace this one.
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664568.html

[PATCH] libstdc++: Implement LWG 3664 changes to ranges::distance

2024-10-04 Thread Patrick Palka

Tested on x86_64-pc-linux-gnu, does this look OK for trunk/backports?

-- >8 --

libstdc++-v3/ChangeLog:

* include/bits/ranges_base.h (__distance_fn::operator()):
Adjust iterator/sentinel overloads as per LWG 3664.
* testsuite/24_iterators/range_operations/distance.cc:
Test LWG 3664 example.
---
 libstdc++-v3/include/bits/ranges_base.h| 14 +++---
 .../24_iterators/range_operations/distance.cc  | 11 +++
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/libstdc++-v3/include/bits/ranges_base.h 
b/libstdc++-v3/include/bits/ranges_base.h
index 137c3c98e14..cb2eba1f841 100644
--- a/libstdc++-v3/include/bits/ranges_base.h
+++ b/libstdc++-v3/include/bits/ranges_base.h
@@ -947,7 +947,9 @@ namespace ranges
 
   struct __distance_fn final
   {
-template _Sent>
+// _GLIBCXX_RESOLVE_LIB_DEFECTS
+// 3664. LWG 3392 broke std::ranges::distance(a, a+3)
+template _Sent>
   requires (!sized_sentinel_for<_Sent, _It>)
   constexpr iter_difference_t<_It>
   operator()[[nodiscard]](_It __first, _Sent __last) const
@@ -961,13 +963,11 @@ namespace ranges
return __n;
   }
 
-template _Sent>
+template> _Sent>
   [[nodiscard]]
-  constexpr iter_difference_t<_It>
-  operator()(const _It& __first, const _Sent& __last) const
-  {
-   return __last - __first;
-  }
+  constexpr iter_difference_t>
+  operator()(_It&& __first, _Sent __last) const
+  { return __last - static_cast&>(__first); }
 
 template
   [[nodiscard]]
diff --git a/libstdc++-v3/testsuite/24_iterators/range_operations/distance.cc 
b/libstdc++-v3/testsuite/24_iterators/range_operations/distance.cc
index 9a1d0c3efe8..336956936c2 100644
--- a/libstdc++-v3/testsuite/24_iterators/range_operations/distance.cc
+++ b/libstdc++-v3/testsuite/24_iterators/range_operations/distance.cc
@@ -144,6 +144,16 @@ test05()
   VERIFY( std::ranges::distance(c4) == 5 );
 }
 
+void
+test06()
+{
+  // LWG 3664 - LWG 3392 broke std::ranges::distance(a, a+3)
+  int a[] = {1, 2, 3};
+  VERIFY( std::ranges::distance(a, a+3) == 3 );
+  VERIFY( std::ranges::distance(a, a) == 0 );
+  VERIFY( std::ranges::distance(a+3, a) == -3 );
+}
+
 int
 main()
 {
@@ -152,4 +162,5 @@ main()
   test03();
   test04();
   test05();
+  test06();
 }
-- 
2.47.0.rc1

Re: [PATCH] libstdc++: Test 17_intro/names.cc with -D_FORTIFY_SOURCE=2 [PR116210]

2024-10-04 Thread Sam James

Jakub Jelinek  writes:

> On Fri, Oct 04, 2024 at 12:52:11PM +0100, Jonathan Wakely wrote:
>> This doesn't really belong in our testsuite, because the sole purpose of
>> the new test is to find bugs in the Glibc wrappers (like the one linked
>> below). But maybe it's a kindness to do it in our testsuite, because we
>> already have this test in place, and one Glibc bug was already found
>> thanks to Sam running the existing test with _FORTIFY_SOURCE defined.
>> 
>> Should we do this?
>
> I think so.  While those bugs are glibc bugs, libstdc++ uses libc headers
> and so if they have namespace cleanness issues, so does libstdc++.
>
>> Add a new testcase that repeats 17_intro/names.cc but with
>> _FORTIFY_SOURCE defined, to find problems in Glibc fortify wrappers like
>> https://sourceware.org/bugzilla/show_bug.cgi?id=32052 (which is fixed
>> now).

I think yes as well -- we've had a lot of discussions in glibc about
getting to a place where we have tests to check the usability of headers
(there's some for this specific namespace problem but there's some
bigger stuff wrt parsing from Clang and so on) but we're not there yet.

This feels like a cheap way of catching issues, and the fact that nobody
noticed between 2.35 and 2.40 (i.e. ~3 years) means it's worthwhile IMO.

>> 
>> libstdc++-v3/ChangeLog:
>> 
>>  PR libstdc++/116210
>>  * testsuite/17_intro/names.cc (sz): Undef for versions of Glibc
>>  that use it in the fortify wrappers.
>>  * testsuite/17_intro/names_fortify.cc: New test.
>
>   Jakub

thanks,
sam

Re: [PATCH 1/2] gcc: make Valgrind errors fatal during bootstrap

2024-10-04 Thread Sam James

Jeff Law  writes:

> On 10/2/24 8:39 PM, Sam James wrote:
>> Valgrind doesn't error out by default which means bootstrap issues like
>> in PR116945 can easily be missed: pass --exit-errorcode=1 to handle this.
>> While here, also set --trace-children=yes to cover child processes
>> of tools invoked during the build.
>> Note that this only handles tools invoke during the build, it
>> doesn't
>> cover everything that --enable-checking=valgrind does.
>> gcc/ChangeLog:
>>  PR other/116945
>>  PR other/116947
>>  * configure: Regenerate.
>>  * configure.ac (valgrind_cmd): Pass additional options.
> But is this going to cause all bootstraps with Ada to fail?  That's
> how I read 116945 which was closed as WONTFIX.  Or am I
> mis-interpreting that BZ and its interaction with this patch?

No, you're right, I consider this on pause unless/until we figure out
that bug -- I'm speaking with mjw about some ideas.

>
> jeff

thanks,
sam

Re: [PATCH 2/3] aarch64: libgcc: add prototypes in cpuinfo

2024-10-04 Thread Kyrylo Tkachov




> On 3 Oct 2024, at 21:44, Christophe Lyon  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Add prototypes for __init_cpu_features_resolver and
> __init_cpu_features to avoid warnings due to -Wmissing-prototypes.
> 
>libgcc/
>* config/aarch64/cpuinfo.c (__init_cpu_features_resolver): Add
>prototype.
>(__init_cpu_features): Likewise.
> ---
> libgcc/config/aarch64/cpuinfo.c | 2 ++
> 1 file changed, 2 insertions(+)
> 
> diff --git a/libgcc/config/aarch64/cpuinfo.c b/libgcc/config/aarch64/cpuinfo.c
> index 4b94fca8695..c62a7453e8e 100644
> --- a/libgcc/config/aarch64/cpuinfo.c
> +++ b/libgcc/config/aarch64/cpuinfo.c
> @@ -418,6 +418,7 @@ __init_cpu_features_constructor(unsigned long hwcap,
>   setCPUFeature(FEAT_INIT);
> }
> 
> +void __init_cpu_features_resolver(unsigned long, const __ifunc_arg_t *);
> void
> __init_cpu_features_resolver(unsigned long hwcap, const __ifunc_arg_t *arg) {
>   if (__aarch64_cpu_features.features)
> @@ -425,6 +426,7 @@ __init_cpu_features_resolver(unsigned long hwcap, const 
> __ifunc_arg_t *arg) {
>   __init_cpu_features_constructor(hwcap, arg);
> }
> 
> +void __init_cpu_features(void);
> void __attribute__ ((constructor))
> __init_cpu_features(void) {
>   unsigned long hwcap;

I thought the intent of the missing-prototypes warning is to warn about missing 
prototypes in a header file primarily.
Should these prototypes go into gcc/common/config/aarch64/cpuinfo.h instead?
Thanks,
Kyrill

> --
> 2.34.1
>

Re: [PATCH 3/3] Record template specialization hash

2024-10-04 Thread Richard Biener

On Thu, 3 Oct 2024, Jason Merrill wrote:

> On 10/2/24 7:53 AM, Richard Biener wrote:
> > For a specific testcase a lot of compile-time is spent in re-hashing
> > hashtable elements upon expansion.  The following records the hash
> > in the hash element.  This speeds up compilation by 20%.
> > 
> > There's probably module-related uses that need to be adjusted.
> > 
> > Bootstrap failed (guess I was expecting this), but still I think this
> > is a good idea - maybe somebody can pick it up.
> 
> Applying the attached, thanks!
> 
> > Possibly instead of having a single global hash table having one per ID
> > would be
> > better.
> 
> That sounds excessive to me.  Is the actual hashtable lookup significant in
> the profile?

No, it's still template hashing at the top.

> > The hashtable also keeps things GC-live ('args' for example).
> 
> Those args should also be referenced by TI_ARGS from the respective template
> specialization.

I see.  The changes improved things, the biggest fruit we can possibly
still reap is coerce_template_parameter_pack causing 6GB transitional
garbage we could release earlier (the packed_args allocation at the
start of the function, the testcase ticks the last one,
packed_args = make_tree_vec (nargs - arg_idx)).

I've pasted the testcase below - it looks innocous and I suspect
filling the templates with actual "meat" would shift the blame from
argument/type vectors to elsewhere?

clang++-17 just blew past my little machines 32GB of memory so at least
we're not worst here.

Richard.



template 
  struct Add
  {};

template 
  struct Operand
  {};

template 
  Operand
  operator+(const Operand&, const Operand&)
  { return {}; }

auto
stress_me(auto x)
{
  return (x + x) + x + (x + (x + x) + x) + x + x;
}

auto
apply_stress(auto op)
{
  return stress_me(stress_me(stress_me(stress_me(stress_me(op);
}


template 
  struct typelist
  {};

void
invoke(auto);

template 
  void
  apply(typelist, auto&& fun)
  {
fun(T0{});
if constexpr (sizeof...(Ts))
  apply(typelist(), fun);
  }


template 
  Operand
  make_operand(T)
  { return {}; }

auto
pah()
{

  apply(typelist(),
[](auto op) {
  apply(typelist(),
[&op](auto op2) {
  invoke(apply_stress(make_operand(op) + make_operand(op2)));
});
});



}

[PATCH] aarch64: Fix bug with max/min (PR116934)

2024-10-04 Thread saurabh.jha


In ac4cdf5cb43c0b09e81760e2a1902ceebcf1a135, I introduced a bug where
I put the new unspecs, UNSPEC_COND_SMAX and UNSPEC_COND_SMIN, into the
wrong iterator.

I should have put new unspecs in SVE_COND_FP_MAXMIN but I put it in
SVE_COND_FP_BINARY_REG instead. That was incorrect because the
SVE_COND_FP_MAXMIN iterator is being used for predicated floating-point
maximum/minimum, not SVE_COND_FP_BINARY_REG.

Also added a testcase to validate the new change.

Regression tested on aarch64-unknown-linux-gnu and found no regressions.
There are some test cases with "libitm" in their directory names which
appear in compare_tests output as changed tests but it looks like they
are in the output just because of changed build directories, like from
build-patched/aarch64-unknown-linux-gnu/./libitm/* to
build-pristine/aarch64-unknown-linux-gnu/./libitm/*. I didn't think it
was a cause of concern and have pushed this for review.

gcc/ChangeLog:

* config/aarch64/iterators.md: Move UNSPEC_COND_SMAX and
UNSPEC_COND_SMIN to correct iterators.

gcc/testsuite/ChangeLog:

PR target/116934
* gcc.target/aarch64/sve2/pr116934.c: New test.
---
 gcc/config/aarch64/iterators.md  |  8 
 gcc/testsuite/gcc.target/aarch64/sve2/pr116934.c | 13 +
 2 files changed, 17 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pr116934.c

diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 0836dee61c9..fcad236eee9 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -3125,9 +3125,7 @@
 
 (define_int_iterator SVE_COND_FP_BINARY_REG
   [UNSPEC_COND_FDIV
-   UNSPEC_COND_FMULX
-   UNSPEC_COND_SMAX
-   UNSPEC_COND_SMIN])
+   UNSPEC_COND_FMULX])
 
 (define_int_iterator SVE_COND_FCADD [UNSPEC_COND_FCADD90
  UNSPEC_COND_FCADD270])
@@ -3135,7 +3133,9 @@
 (define_int_iterator SVE_COND_FP_MAXMIN [UNSPEC_COND_FMAX
 	 UNSPEC_COND_FMAXNM
 	 UNSPEC_COND_FMIN
-	 UNSPEC_COND_FMINNM])
+	 UNSPEC_COND_FMINNM
+	 UNSPEC_COND_SMAX
+	 UNSPEC_COND_SMIN])
 
 (define_int_iterator SVE_COND_FP_TERNARY [UNSPEC_COND_FMLA
 	  UNSPEC_COND_FMLS
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pr116934.c b/gcc/testsuite/gcc.target/aarch64/sve2/pr116934.c
new file mode 100644
index 000..94fb96ffa7d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/pr116934.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Ofast -mcpu=neoverse-v2" } */
+
+int a;
+float *b;
+
+void foo() {
+  for (; a; a--, b += 4) {
+b[0] = b[1] = b[2] = b[2] > 0 ?: 0;
+if (b[3] < 0)
+  b[3] = 0;
+  }
+}

Re: [PATCH] i386: Fix up ix86_expand_int_compare with TImode comparisons of SUBREGs from V8{H,B}Fmode against zero [PR116921]

2024-10-04 Thread Uros Bizjak

On Fri, Oct 4, 2024 at 12:12 PM Jakub Jelinek  wrote:
>
> Hi!
>
> The following testcase ICEs, because the ix86_expand_int_compare
> optimization to use {,v}ptest assumes there are instructions for all
> 16-byte vector modes.  That isn't the case, we only have one for
> V16QI, V8HI, V4SI, V2DI, V1TI, V4SF and V2DF, not for
> V8HF nor V8BF.
>
> The following patch fixes that by using the V8HI instruction instead
> for those 2 modes.  tmp can't be a SUBREG, because it is SUBREG_REG
> of another SUBREG, so we don't need to worry about gen_lowpart
> failing.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2024-10-04  Jakub Jelinek  
>
> PR target/116921
> * config/i386/i386-expand.cc (ix86_expand_int_compare): Add a SUBREG
> to V8HImode from V8HFmode or V8BFmode before generating a ptest.
>
> * gcc.target/i386/pr116921.c: New test.

OK.

Thanks,
Uros.
>
> --- gcc/config/i386/i386-expand.cc.jj   2024-10-03 17:27:28.328227793 +0200
> +++ gcc/config/i386/i386-expand.cc  2024-10-03 18:11:18.514076904 +0200
> @@ -3095,6 +3095,9 @@ ix86_expand_int_compare (enum rtx_code c
>&& GET_MODE_SIZE (GET_MODE (SUBREG_REG (op0))) == 16)
>  {
>tmp = SUBREG_REG (op0);
> +  if (GET_MODE_INNER (GET_MODE (tmp)) == HFmode
> + || GET_MODE_INNER (GET_MODE (tmp)) == BFmode)
> +   tmp = gen_lowpart (V8HImode, tmp);
>tmp = gen_rtx_UNSPEC (CCZmode, gen_rtvec (2, tmp, tmp), UNSPEC_PTEST);
>  }
>else
> --- gcc/testsuite/gcc.target/i386/pr116921.c.jj 2024-10-03 18:16:36.368711747 
> +0200
> +++ gcc/testsuite/gcc.target/i386/pr116921.c2024-10-03 18:17:25.702034243 
> +0200
> @@ -0,0 +1,12 @@
> +/* PR target/116921 */
> +/* { dg-do compile { target int128 } } */
> +/* { dg-options "-O2 -msse4" } */
> +
> +long x;
> +_Float16 __attribute__((__vector_size__ (16))) f;
> +
> +void
> +foo (void)
> +{
> +  x -= !(__int128) (f / 2);
> +}
>
> Jakub
>

[PATCH] aarch64: Handle SVE modes in aarch64_evpc_reencode

2024-10-04 Thread Richard Sandiford

For Advanced SIMD modes, aarch64_evpc_reencode tests whether
a permute in a narrow element mode can be done more cheaply
in a wider mode.  For example, { 0, 1, 8, 9, 4, 5, 12, 13 }
on V8HI is a natural TRN1 on V4SI ({ 0, 4, 2, 6 }).

This patch extends the code to handle SVE data and predicate
modes as well.  This is a prerequisite to getting good results
for PR116583.

Tested on aarch64-linux-gnu (with and without SVE enabled by default).
I'll push on Monday if there are no comments before then.

Thanks,
Richard


gcc/
PR target/116583
* config/aarch64/aarch64.cc (aarch64_coalesce_units): New function,
extending the Advanced SIMD handling from...
(aarch64_evpc_reencode): ...here to SVE data and predicate modes.

gcc/testsuite/
PR target/116583
* gcc.target/aarch64/sve/permute_1.c: New test.
* gcc.target/aarch64/sve/permute_2.c: Likewise.
* gcc.target/aarch64/sve/permute_3.c: Likewise.
* gcc.target/aarch64/sve/permute_4.c: Likewise.
---
 gcc/config/aarch64/aarch64.cc |  55 +++-
 .../gcc.target/aarch64/sve/permute_1.c| 106 +++
 .../gcc.target/aarch64/sve/permute_2.c| 277 ++
 .../gcc.target/aarch64/sve/permute_3.c|  91 ++
 .../gcc.target/aarch64/sve/permute_4.c| 113 +++
 5 files changed, 633 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/permute_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/permute_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/permute_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/permute_4.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index e7bb3278a27..102680a0efc 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -1933,6 +1933,46 @@ aarch64_sve_int_mode (machine_mode mode)
   return aarch64_sve_data_mode (int_mode, GET_MODE_NUNITS (mode)).require ();
 }
 
+/* Look for a vector mode with the same classification as VEC_MODE,
+   but with each group of FACTOR elements coalesced into a single element.
+   In other words, look for a mode in which the elements are FACTOR times
+   larger and in which the number of elements is FACTOR times smaller.
+
+   Return the mode found, if one exists.  */
+
+static opt_machine_mode
+aarch64_coalesce_units (machine_mode vec_mode, unsigned int factor)
+{
+  auto elt_bits = vector_element_size (GET_MODE_BITSIZE (vec_mode),
+  GET_MODE_NUNITS (vec_mode));
+  auto vec_flags = aarch64_classify_vector_mode (vec_mode);
+  if (vec_flags & VEC_SVE_PRED)
+{
+  if (known_eq (GET_MODE_SIZE (vec_mode), BYTES_PER_SVE_PRED))
+   return aarch64_sve_pred_mode (elt_bits * factor);
+  return {};
+}
+
+  scalar_mode new_elt_mode;
+  if (!int_mode_for_size (elt_bits * factor, false).exists (&new_elt_mode))
+return {};
+
+  if (vec_flags == VEC_ADVSIMD)
+{
+  auto mode = aarch64_simd_container_mode (new_elt_mode,
+  GET_MODE_BITSIZE (vec_mode));
+  if (mode != word_mode)
+   return mode;
+}
+  else if (vec_flags & VEC_SVE_DATA)
+{
+  poly_uint64 new_nunits;
+  if (multiple_p (GET_MODE_NUNITS (vec_mode), factor, &new_nunits))
+   return aarch64_sve_data_mode (new_elt_mode, new_nunits);
+}
+  return {};
+}
+
 /* Implement TARGET_VECTORIZE_RELATED_MODE.  */
 
 static opt_machine_mode
@@ -25731,26 +25771,23 @@ aarch64_evpc_reencode (struct expand_vec_perm_d *d)
 {
   expand_vec_perm_d newd;
 
-  if (d->vec_flags != VEC_ADVSIMD)
+  /* The subregs that we'd create are not supported for big-endian SVE;
+ see aarch64_modes_compatible_p for details.  */
+  if (BYTES_BIG_ENDIAN && (d->vec_flags & VEC_ANY_SVE))
 return false;
 
   /* Get the new mode.  Always twice the size of the inner
  and half the elements.  */
-  poly_uint64 vec_bits = GET_MODE_BITSIZE (d->vmode);
-  unsigned int new_elt_bits = GET_MODE_UNIT_BITSIZE (d->vmode) * 2;
-  auto new_elt_mode = int_mode_for_size (new_elt_bits, false).require ();
-  machine_mode new_mode = aarch64_simd_container_mode (new_elt_mode, vec_bits);
-
-  if (new_mode == word_mode)
+  machine_mode new_mode;
+  if (!aarch64_coalesce_units (d->vmode, 2).exists (&new_mode))
 return false;
 
   vec_perm_indices newpermindices;
-
   if (!newpermindices.new_shrunk_vector (d->perm, 2))
 return false;
 
   newd.vmode = new_mode;
-  newd.vec_flags = VEC_ADVSIMD;
+  newd.vec_flags = d->vec_flags;
   newd.op_mode = newd.vmode;
   newd.op_vec_flags = newd.vec_flags;
   newd.target = d->target ? gen_lowpart (new_mode, d->target) : NULL;
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/permute_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/permute_1.c
new file mode 100644
index 000..90aeef32188
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/permute_1.c
@@ -0,0 +1,106 @@
+/* { dg-options "-O -msve-

[PATCH] aarch64: Fix general permutes of svbfloat16_ts

2024-10-04 Thread Richard Sandiford

Testing gcc.target/aarch64/sve/permute_2.c without the associated GCC
patch triggered an unrecognisable insn ICE for the svbfloat16_t tests.
This was because the implementation of general two-vector permutes
requires two TBLs and an ORR, with the ORR being represented as an
unspec for floating-point modes.  The associated pattern did not
cover VNx8BF.

Tested on aarch64-linux-gnu (with and without SVE enabled by default).
I'll push on Monday if there are no comments before then.

Thanks,
Richard


gcc/
* iterators.md (SVE_I): Move further up file.
(SVE_F): New mode iterator.
(SVE_ALL): Redefine in terms of SVE_I and SVE_F.
* config/aarch64/aarch64-sve.md (*3): Extend
to all SVE_F.

gcc/testsuite/
* gcc.target/aarch64/sve/permute_5.c: New test.
---
 gcc/config/aarch64/aarch64-sve.md |  8 +++---
 gcc/config/aarch64/iterators.md   | 27 +--
 .../gcc.target/aarch64/sve/permute_5.c| 10 +++
 3 files changed, 27 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/permute_5.c

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index ec1d059a2b1..90db51e51b9 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -6455,10 +6455,10 @@ (define_expand "@aarch64_frecps"
 ;; by providing this, but we need to use UNSPECs since rtx logical ops
 ;; aren't defined for floating-point modes.
 (define_insn "*3"
-  [(set (match_operand:SVE_FULL_F 0 "register_operand" "=w")
-   (unspec:SVE_FULL_F
- [(match_operand:SVE_FULL_F 1 "register_operand" "w")
-  (match_operand:SVE_FULL_F 2 "register_operand" "w")]
+  [(set (match_operand:SVE_F 0 "register_operand" "=w")
+   (unspec:SVE_F
+ [(match_operand:SVE_F 1 "register_operand" "w")
+  (match_operand:SVE_F 2 "register_operand" "w")]
  LOGICALF))]
   "TARGET_SVE"
   "\t%0.d, %1.d, %2.d"
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 0836dee61c9..0f19cae73c9 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -519,15 +519,20 @@ (define_mode_iterator SVE_PARTIAL_I [VNx8QI VNx4QI VNx2QI
 VNx4HI VNx2HI
 VNx2SI])
 
+;; All SVE integer vector modes.
+(define_mode_iterator SVE_I [VNx16QI VNx8QI VNx4QI VNx2QI
+VNx8HI VNx4HI VNx2HI
+VNx4SI VNx2SI
+VNx2DI])
+
+;; All SVE floating-point vector modes.
+(define_mode_iterator SVE_F [VNx8HF VNx4HF VNx2HF
+VNx8BF VNx4BF VNx2BF
+VNx4SF VNx2SF
+VNx2DF])
+
 ;; All SVE vector modes.
-(define_mode_iterator SVE_ALL [VNx16QI VNx8QI VNx4QI VNx2QI
-  VNx8HI VNx4HI VNx2HI
-  VNx8HF VNx4HF VNx2HF
-  VNx8BF VNx4BF VNx2BF
-  VNx4SI VNx2SI
-  VNx4SF VNx2SF
-  VNx2DI
-  VNx2DF])
+(define_mode_iterator SVE_ALL [SVE_I SVE_F])
 
 ;; All SVE 2-vector modes.
 (define_mode_iterator SVE_FULLx2 [VNx32QI VNx16HI VNx8SI VNx4DI
@@ -549,12 +554,6 @@ (define_mode_iterator SVE_STRUCT [SVE_FULLx2 SVE_FULLx3 
SVE_FULLx4])
 ;; All SVE vector and structure modes.
 (define_mode_iterator SVE_ALL_STRUCT [SVE_ALL SVE_STRUCT])
 
-;; All SVE integer vector modes.
-(define_mode_iterator SVE_I [VNx16QI VNx8QI VNx4QI VNx2QI
-VNx8HI VNx4HI VNx2HI
-VNx4SI VNx2SI
-VNx2DI])
-
 ;; All SVE integer vector modes and Advanced SIMD 64-bit vector
 ;; element modes
 (define_mode_iterator SVE_I_SIMD_DI [SVE_I V2DI])
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/permute_5.c 
b/gcc/testsuite/gcc.target/aarch64/sve/permute_5.c
new file mode 100644
index 000..786b05ee3e7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/permute_5.c
@@ -0,0 +1,10 @@
+/* { dg-options "-O -msve-vector-bits=256" } */
+
+typedef __SVBfloat16_t vbfloat16 __attribute__((arm_sve_vector_bits(256)));
+
+vbfloat16
+foo (vbfloat16 x, vbfloat16 y)
+{
+  return __builtin_shufflevector (x, y, 0, 2, 1, 3, 16, 19, 17, 18,
+ 8, 9, 10, 11, 23, 22, 21, 20);
+}
-- 
2.25.1

[PATCH] x86: Disable stack protector for naked functions

2024-10-04 Thread H.J. Lu

Since naked functions should not enable stack protector, define
TARGET_STACK_PROTECT_RUNTIME_ENABLED_P to disable stack protector
for naked functions.

gcc/

PR target/116962
* config/i386/i386.cc (ix86_stack_protect_runtime_enabled_p): New
function.
(TARGET_STACK_PROTECT_RUNTIME_ENABLED_P): New.

gcc/testsuite/

PR target/116962
* gcc.target/i386/pr116962.c: New file.

OK for master?

Thanks.

--
H.J.
From 99ab364f6657c2d2e5e4a389b07b00c12d4bad0d Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 4 Oct 2024 16:21:15 +0800
Subject: [PATCH] x86: Disable stack protector for naked functions

Since naked functions should not enable stack protector, define
TARGET_STACK_PROTECT_RUNTIME_ENABLED_P to disable stack protector
for naked functions.

gcc/

	PR target/116962
	* config/i386/i386.cc (ix86_stack_protect_runtime_enabled_p): New
	function.
	(TARGET_STACK_PROTECT_RUNTIME_ENABLED_P): New.

gcc/testsuite/

	PR target/116962
	* gcc.target/i386/pr116962.c: New file.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/i386.cc  | 11 +++
 gcc/testsuite/gcc.target/i386/pr116962.c | 10 ++
 2 files changed, 21 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr116962.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index ad2e7b447ff..90a564b2ffa 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -24435,6 +24435,13 @@ ix86_stack_protect_guard (void)
   return default_stack_protect_guard ();
 }
 
+static bool
+ix86_stack_protect_runtime_enabled_p (void)
+{
+  /* Naked functions should not enable stack protector.  */
+  return !ix86_function_naked (current_function_decl);
+}
+
 /* For 32-bit code we can save PIC register setup by using
__stack_chk_fail_local hidden function instead of calling
__stack_chk_fail directly.  64-bit code doesn't need to setup any PIC
@@ -26821,6 +26828,10 @@ ix86_libgcc_floating_mode_supported_p
 #undef TARGET_STACK_PROTECT_GUARD
 #define TARGET_STACK_PROTECT_GUARD ix86_stack_protect_guard
 
+#undef TARGET_STACK_PROTECT_RUNTIME_ENABLED_P
+#define TARGET_STACK_PROTECT_RUNTIME_ENABLED_P \
+  ix86_stack_protect_runtime_enabled_p
+
 #if !TARGET_MACHO
 #undef TARGET_STACK_PROTECT_FAIL
 #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail
diff --git a/gcc/testsuite/gcc.target/i386/pr116962.c b/gcc/testsuite/gcc.target/i386/pr116962.c
new file mode 100644
index 000..ced16eee746
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr116962.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target fstack_protector } } */
+/* { dg-options "-O2 -fstack-protector-all" } */
+/* { dg-final { scan-assembler-not "__stack_chk_fail" } } */
+
+__attribute__ ((naked))
+void
+foo (void)
+{
+  asm ("ret");
+}
-- 
2.46.2

[PATCH] Fixup dumping of re-trying without/with single-lane SLP

2024-10-04 Thread Richard Biener

The following fixes the order of decrementing the SLP mode and
the dumping.

Build on x86_64-unknown-linux-gnu, pushed.

* tree-vect-loop.cc (vect_analyze_loop_2): Derement 'slp'
before dumping which stage we're starting.
---
 gcc/tree-vect-loop.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 3a9eca289d8..3d62fecfae1 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3275,6 +3275,9 @@ again:
}
 }
 
+  /* Roll back state appropriately.  Degrade SLP this time.  From multi-
+ to single-lane to disabled.  */
+  --slp;
   if (dump_enabled_p ())
 {
   if (slp)
@@ -3285,9 +3288,6 @@ again:
 "re-trying with SLP disabled\n");
 }
 
-  /* Roll back state appropriately.  Degrade SLP this time.  From multi-
- to single-lane to disabled.  */
-  --slp;
   /* Restore vectorization factor as it were without SLP.  */
   LOOP_VINFO_VECT_FACTOR (loop_vinfo) = saved_vectorization_factor;
   /* Free the SLP instances.  */
-- 
2.43.0

Re: [PATCH 1/2] c++: add -Wdeprecated-literal-operator [CWG2521]

2024-10-04 Thread Jakub Jelinek

On Fri, Oct 04, 2024 at 12:19:03PM +0200, Jakub Jelinek wrote:
> Though, maybe the tests should have both the deprecated syntax and the
> non-deprecated one...

Here is a variant of the patch which does that.

Tested on x86_64-linux and i686-linux, ok for trunk?

2024-10-04  Jakub Jelinek  

* g++.dg/cpp26/unevalstr1.C: Revert the 2024-10-03 changes, instead
expect extra warnings.  Add another set of tests without space
between " and _.
* g++.dg/cpp26/unevalstr2.C: Expect extra warnings for C++23.  Add
another set of tests without space between " and _.

--- gcc/testsuite/g++.dg/cpp26/unevalstr1.C.jj  2024-10-04 12:28:08.820899177 
+0200
+++ gcc/testsuite/g++.dg/cpp26/unevalstr1.C 2024-10-04 14:15:35.563531334 
+0200
@@ -83,21 +83,57 @@ extern "\o{0103}" { int f14 (); }   // { d
 [[nodiscard ("\x{20}")]] int h19 ();   // { dg-error "numeric escape sequence 
in unevaluated string" }
 [[nodiscard ("\h")]] int h20 ();   // { dg-error "unknown escape sequence" 
}
 
-float operator ""_my0 (const char *);
-float operator "" ""_my1 (const char *);
-float operator L""_my2 (const char *); // { dg-error "invalid encoding prefix 
in literal operator" }
-float operator u""_my3 (const char *); // { dg-error "invalid encoding prefix 
in literal operator" }
-float operator U""_my4 (const char *); // { dg-error "invalid encoding prefix 
in literal operator" }
-float operator u8""_my5 (const char *);// { dg-error "invalid encoding 
prefix in literal operator" }
-float operator L"" ""_my6 (const char *);  // { dg-error "invalid encoding 
prefix in literal operator" }
-float operator u"" ""_my7 (const char *);  // { dg-error "invalid encoding 
prefix in literal operator" }
-float operator U"" ""_my8 (const char *);  // { dg-error "invalid encoding 
prefix in literal operator" }
-float operator u8"" ""_my9 (const char *); // { dg-error "invalid encoding 
prefix in literal operator" }
-float operator "" L""_my10 (const char *); // { dg-error "invalid encoding 
prefix in literal operator" }
-float operator "" u""_my11 (const char *); // { dg-error "invalid encoding 
prefix in literal operator" }
-float operator "" U""_my12 (const char *); // { dg-error "invalid encoding 
prefix in literal operator" }
-float operator "" u8""_my13 (const char *);// { dg-error "invalid encoding 
prefix in literal operator" }
-float operator "\0"_my14 (const char *);   // { dg-error "expected empty 
string after 'operator' keyword" }
-float operator "\x00"_my15 (const char *); // { dg-error "expected empty 
string after 'operator' keyword" }
-float operator "\h"_my16 (const char *);   // { dg-error "expected empty 
string after 'operator' keyword" }
+float operator "" _my0 (const char *);
+float operator "" "" _my1 (const char *);
+float operator L"" _my2 (const char *);// { dg-error "invalid 
encoding prefix in literal operator" }
+float operator u"" _my3 (const char *);// { dg-error "invalid 
encoding prefix in literal operator" }
+float operator U"" _my4 (const char *);// { dg-error "invalid 
encoding prefix in literal operator" }
+float operator u8"" _my5 (const char *);   // { dg-error "invalid encoding 
prefix in literal operator" }
+float operator L"" "" _my6 (const char *); // { dg-error "invalid encoding 
prefix in literal operator" }
+float operator u"" "" _my7 (const char *); // { dg-error "invalid encoding 
prefix in literal operator" }
+float operator U"" "" _my8 (const char *); // { dg-error "invalid encoding 
prefix in literal operator" }
+float operator u8"" "" _my9 (const char *);// { dg-error "invalid encoding 
prefix in literal operator" }
+float operator "" L"" _my10 (const char *);// { dg-error "invalid encoding 
prefix in literal operator" }
+float operator "" u"" _my11 (const char *);// { dg-error "invalid encoding 
prefix in literal operator" }
+float operator "" U"" _my12 (const char *);// { dg-error "invalid encoding 
prefix in literal operator" }
+float operator "" u8"" _my13 (const char *);   // { dg-error "invalid encoding 
prefix in literal operator" }
+float operator "\0" _my14 (const char *);  // { dg-error "expected empty 
string after 'operator' keyword" }
+float operator "\x00" _my15 (const char *);// { dg-error "expected empty 
string after 'operator' keyword" }
+float operator "\h" _my16 (const char *);  // { dg-error "expected empty 
string after 'operator' keyword" }
+   // { dg-error "unknown escape 
sequence" "" { target *-*-* } .-1 }
+// { dg-warning "space between quotes and suffix is deprecated" "" { target 
*-*-* } .-18 }
+// { dg-warning "space between quotes and suffix is deprecated" "" { target 
*-*-* } .-18 }
+// { dg-warning "space between quotes and suffix is deprecated" "" { target 
*-*-* } .-18 }
+// { dg-warning "space between quotes and suffix is deprecated" "" {

Re: [PATCH] libstdc++: Test 17_intro/names.cc with -D_FORTIFY_SOURCE=2 [PR116210]

2024-10-04 Thread Jakub Jelinek

On Fri, Oct 04, 2024 at 12:52:11PM +0100, Jonathan Wakely wrote:
> This doesn't really belong in our testsuite, because the sole purpose of
> the new test is to find bugs in the Glibc wrappers (like the one linked
> below). But maybe it's a kindness to do it in our testsuite, because we
> already have this test in place, and one Glibc bug was already found
> thanks to Sam running the existing test with _FORTIFY_SOURCE defined.
> 
> Should we do this?

I think so.  While those bugs are glibc bugs, libstdc++ uses libc headers
and so if they have namespace cleanness issues, so does libstdc++.

> Add a new testcase that repeats 17_intro/names.cc but with
> _FORTIFY_SOURCE defined, to find problems in Glibc fortify wrappers like
> https://sourceware.org/bugzilla/show_bug.cgi?id=32052 (which is fixed
> now).
> 
> libstdc++-v3/ChangeLog:
> 
>   PR libstdc++/116210
>   * testsuite/17_intro/names.cc (sz): Undef for versions of Glibc
>   that use it in the fortify wrappers.
>   * testsuite/17_intro/names_fortify.cc: New test.

Jakub

[PATCH 3/4] vect: Support more VLA SLP permutations

2024-10-04 Thread Richard Sandiford

This is the main patch for PR116583.  Previously, we only
supported VLA SLP permutations for which the output and inputs
have the same number of lanes, and for which that number of
lanes divides the number of vector elements.

The patch extends this to handle:

(1) "packs" of a single 2N-vector input into an N-vector output
(2) "unpacks" of N-vector inputs into an XN-vector output

Hopefully the comments in the code explain the approach.

The contents of the:

  for (unsigned i = 0; i < ncopies; ++i)

loop do not change; the patch simply adds an outer loop around it.

The patch removes the XFAIL in slp-13.c and also improves
the SVE vect.exp results with vect-force-slp=1.  I haven't
added new tests specifically for this, since presumably the
existing ones will cover it once the SLP switch is flipped.

gcc/
PR tree-optimization/PR116583
* tree-vect-slp.cc (vectorizable_slp_permutation_1): Handle
variable-length pack and unpack permutations.

gcc/testsuite/
PR tree-optimization/PR116583
* gcc.dg/vect/slp-13.c: Remove xfail for vect_variable_length.
* gcc.dg/vect/slp-13-big-array.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/slp-13-big-array.c |   2 +-
 gcc/testsuite/gcc.dg/vect/slp-13.c   |   2 +-
 gcc/tree-vect-slp.cc | 107 ++-
 3 files changed, 82 insertions(+), 29 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c 
b/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c
index ca70856c1dd..e45f8aab133 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c
@@ -137,4 +137,4 @@ int main (void)
 /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { 
{ vect_interleave && vect_extract_even_odd } && { ! vect_pack_trunc } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { 
target { ! vect_pack_trunc } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { 
{ vect_interleave && vect_extract_even_odd } && vect_pack_trunc } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { 
target vect_pack_trunc xfail vect_variable_length } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { 
target vect_pack_trunc } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/slp-13.c 
b/gcc/testsuite/gcc.dg/vect/slp-13.c
index b7f947e6dbe..d6346aef978 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-13.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-13.c
@@ -131,4 +131,4 @@ int main (void)
 /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { 
{ vect_interleave && vect_extract_even_odd } && { ! vect_pack_trunc } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { 
target { ! vect_pack_trunc } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { 
{ vect_interleave && vect_extract_even_odd } && vect_pack_trunc } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { 
target vect_pack_trunc xfail vect_variable_length } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { 
target vect_pack_trunc } } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 470128ea775..66f5906ebb9 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -10194,6 +10194,13 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, 
gimple_stmt_iterator *gsi,
   unsigned i;
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
   bool repeating_p = multiple_p (nunits, SLP_TREE_LANES (node));
+  /* True if we're permuting a single input of 2N vectors down
+ to N vectors.  This case doesn't generalize beyond 2 since
+ VEC_PERM_EXPR only takes 2 inputs.  */
+  bool pack_p = false;
+  /* If we're permuting inputs of N vectors each into X*N outputs,
+ this is the value of X, otherwise it is 1.  */
+  unsigned int unpack_factor = 1;
   tree op_vectype = NULL_TREE;
   FOR_EACH_VEC_ELT (children, i, child)
 if (SLP_TREE_VECTYPE (child))
@@ -10215,7 +10222,20 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, 
gimple_stmt_iterator *gsi,
 "Unsupported vector types in lane permutation\n");
  return -1;
}
-  if (SLP_TREE_LANES (child) != SLP_TREE_LANES (node))
+  auto op_nunits = TYPE_VECTOR_SUBPARTS (op_vectype);
+  unsigned int this_unpack_factor;
+  /* Check whether the input has twice as many lanes per vector.  */
+  if (children.length () == 1
+ && known_eq (SLP_TREE_LANES (child) * nunits,
+  SLP_TREE_LANES (node) * op_nunits * 2))
+   pack_p = true;
+  /* Check whether the output has N times as many lanes per vector.  */
+  else if (constant_multiple_p (SLP_TREE_LANES (node) * op_nunits,
+   SLP_TREE_LANES (child) * nunits,
+

[PATCH 2/4] vect: Restructure repeating_p case for SLP permutations

2024-10-04 Thread Richard Sandiford

The repeating_p case previously handled the specific situation
in which the inputs have N lanes and the output has N lanes,
where N divides the number of vector elements.  In that case,
every output uses the same permute vector.

The code was therefore structured so that the outer loop only
constructed one permute vector, with an inner loop generating
as many VEC_PERM_EXPRs from it as required.

However, the main patch for PR116583 adds support for cycling
through N permute vectors, rather than just having one.
The current structure doesn't really handle that case well.
(We'd need to interleave the results after generating them,
which sounds a bit fragile.)

This patch instead makes the transform phase calculate each output
vector's permutation explicitly, like for the !repeating_p path.
As a bonus, it gets rid of one use of SLP_TREE_NUMBER_OF_VEC_STMTS.

This arguably undermines one of the justifications for using repeating_p
for constant-length vectors: that the repeating_p path involved less
work than the !repeating_p path.  That justification does still hold for
the analysis phase, though, and that should be the more time-sensitive
part.  And the other justification -- to get more coverage of the code --
still applies.  So I'd prefer that we continue to use repeating_p for
constant-length vectors unless that causes a known missed optimisation.

gcc/
PR tree-optimization/116583
* tree-vect-slp.cc (vectorizable_slp_permutation_1): Remove
the noutputs_per_mask inner loop and instead generate a
separate permute vector for each output.
---
 gcc/tree-vect-slp.cc | 75 
 1 file changed, 41 insertions(+), 34 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 7aeda69f447..470128ea775 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -10243,26 +10243,33 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, 
gimple_stmt_iterator *gsi,
   return 1;
 }
 
-  /* REPEATING_P is true if every output vector is guaranteed to use the
- same permute vector.  We can handle that case for both variable-length
- and constant-length vectors, but we only handle other cases for
- constant-length vectors.
+  /* Set REPEATING_P to true if every output uses the same permute vector
+ and if we can generate the vectors in a vector-length agnostic way.
+
+ When REPEATING_P is true, NOUTPUTS holds the total number of outputs
+ that we actually need to generate.  */
+  uint64_t noutputs = 0;
+  loop_vec_info linfo = dyn_cast  (vinfo);
+  if (!linfo
+  || !constant_multiple_p (LOOP_VINFO_VECT_FACTOR (linfo)
+  * SLP_TREE_LANES (node), nunits, &noutputs))
+repeating_p = false;
+
+  /* We can handle the conditions described for REPEATING_P above for
+ both variable- and constant-length vectors.  The fallback requires
+ us to generate every element of every permute vector explicitly,
+ which is only possible for constant-length permute vectors.
 
  Set:
 
  - NPATTERNS and NELTS_PER_PATTERN to the encoding of the permute
-   mask vector that we want to build.
+   mask vectors that we want to build.
 
  - NCOPIES to the number of copies of PERM that we need in order
-   to build the necessary permute mask vectors.
-
- - NOUTPUTS_PER_MASK to the number of output vectors we want to create
-   for each permute mask vector.  This is only relevant when GSI is
-   nonnull.  */
+   to build the necessary permute mask vectors.  */
   uint64_t npatterns;
   unsigned nelts_per_pattern;
   uint64_t ncopies;
-  unsigned noutputs_per_mask;
   if (repeating_p)
 {
   /* We need a single permute mask vector that has the form:
@@ -10274,7 +10281,6 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, 
gimple_stmt_iterator *gsi,
 that we use for permutes requires 3n elements.  */
   npatterns = SLP_TREE_LANES (node);
   nelts_per_pattern = ncopies = 3;
-  noutputs_per_mask = SLP_TREE_NUMBER_OF_VEC_STMTS (node);
 }
   else
 {
@@ -10284,10 +10290,8 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, 
gimple_stmt_iterator *gsi,
  || !TYPE_VECTOR_SUBPARTS (op_vectype).is_constant ())
return -1;
   nelts_per_pattern = ncopies = 1;
-  if (loop_vec_info linfo = dyn_cast  (vinfo))
-   if (!LOOP_VINFO_VECT_FACTOR (linfo).is_constant (&ncopies))
- return -1;
-  noutputs_per_mask = 1;
+  if (linfo && !LOOP_VINFO_VECT_FACTOR (linfo).is_constant (&ncopies))
+   return -1;
 }
   unsigned olanes = ncopies * SLP_TREE_LANES (node);
   gcc_assert (repeating_p || multiple_p (olanes, nunits));
@@ -10364,16 +10368,24 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, 
gimple_stmt_iterator *gsi,
   mask.quick_grow (count);
   vec_perm_indices indices;
   unsigned nperms = 0;
-  for (unsigned i = 0; i < vperm.length (); ++i)
-{
-  mask_element = vperm[i].second

[PATCH 1/4] vect: Variable lane indices in vectorizable_slp_permutation_1

2024-10-04 Thread Richard Sandiford

The main patch for PR116583 needs to create variable indices into
an input vector.  This pre-patch changes the types to allow that.

There is no pretty-print format for poly_uint64 because of issues
with passing C++ objects through "...".

gcc/
PR tree-optimization/116583
* tree-vect-slp.cc (vectorizable_slp_permutation_1): Using
poly_uint64 for scalar lane indices.
---
 gcc/tree-vect-slp.cc | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 482b9d50496..7aeda69f447 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -10296,8 +10296,8 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, 
gimple_stmt_iterator *gsi,
  from the { SLP operand, scalar lane } permutation as recorded in the
  SLP node as intermediate step.  This part should already work
  with SLP children with arbitrary number of lanes.  */
-  auto_vec, unsigned> > vperm;
-  auto_vec active_lane;
+  auto_vec, poly_uint64>> vperm;
+  auto_vec active_lane;
   vperm.create (olanes);
   active_lane.safe_grow_cleared (children.length (), true);
   for (unsigned i = 0; i < ncopies; ++i)
@@ -10312,8 +10312,9 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, 
gimple_stmt_iterator *gsi,
{
  /* We checked above that the vectors are constant-length.  */
  unsigned vnunits = TYPE_VECTOR_SUBPARTS (vtype).to_constant ();
- unsigned vi = (active_lane[p.first] + p.second) / vnunits;
- unsigned vl = (active_lane[p.first] + p.second) % vnunits;
+ unsigned lane = active_lane[p.first].to_constant ();
+ unsigned vi = (lane + p.second) / vnunits;
+ unsigned vl = (lane + p.second) % vnunits;
  vperm.quick_push ({{p.first, vi}, vl});
}
}
@@ -10339,9 +10340,10 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, 
gimple_stmt_iterator *gsi,
  ? multiple_p (i, npatterns)
  : multiple_p (i, TYPE_VECTOR_SUBPARTS (vectype
dump_printf (MSG_NOTE, ",");
- dump_printf (MSG_NOTE, " vops%u[%u][%u]",
-  vperm[i].first.first, vperm[i].first.second,
-  vperm[i].second);
+ dump_printf (MSG_NOTE, " vops%u[%u][",
+  vperm[i].first.first, vperm[i].first.second);
+ dump_dec (MSG_NOTE, vperm[i].second);
+ dump_printf (MSG_NOTE, "]");
}
   dump_printf (MSG_NOTE, "\n");
 }
-- 
2.25.1

[PATCH 0/4] Support more VLA SLP permutations

2024-10-04 Thread Richard Sandiford

This series should fix the target-independent parts of PR116583.
(We also need some target-specific patches, to be posted separately.)

The explanations are in the individual commit messages, but I've
attached a -b diff below in case my attempt to split the patch up
has just obfuscated things instead.

Tested on aarch64-linux-gnu (with and without SVE enabled by default)
and x86_64-linux-gnu.  Also tested by running the vect testsuite
with vect-force-slp=1.

Richard Sandiford (4):
  vect: Variable lane indices in vectorizable_slp_permutation_1
  vect: Restructure repeating_p case for SLP permutations
  vect: Support more VLA SLP permutations
  vect: Add more dump messages for VLA SLP permutation

 gcc/testsuite/gcc.dg/vect/slp-13-big-array.c |   2 +-
 gcc/testsuite/gcc.dg/vect/slp-13.c   |   2 +-
 gcc/tree-vect-slp.cc | 190 +--
 3 files changed, 134 insertions(+), 60 deletions(-)

-- 
2.25.1


diff --git a/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c 
b/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c
index ca70856c1dd..e45f8aab133 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c
@@ -137,4 +137,4 @@ int main (void)
 /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { 
{ vect_interleave && vect_extract_even_odd } && { ! vect_pack_trunc } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { 
target { ! vect_pack_trunc } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { 
{ vect_interleave && vect_extract_even_odd } && vect_pack_trunc } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { 
target vect_pack_trunc xfail vect_variable_length } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { 
target vect_pack_trunc } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/slp-13.c 
b/gcc/testsuite/gcc.dg/vect/slp-13.c
index b7f947e6dbe..d6346aef978 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-13.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-13.c
@@ -131,4 +131,4 @@ int main (void)
 /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { 
{ vect_interleave && vect_extract_even_odd } && { ! vect_pack_trunc } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { 
target { ! vect_pack_trunc } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { 
{ vect_interleave && vect_extract_even_odd } && vect_pack_trunc } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { 
target vect_pack_trunc xfail vect_variable_length } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { 
target vect_pack_trunc } } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 482b9d50496..56fb55cb628 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -10194,6 +10194,13 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, 
gimple_stmt_iterator *gsi,
   unsigned i;
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
   bool repeating_p = multiple_p (nunits, SLP_TREE_LANES (node));
+  /* True if we're permuting a single input of 2N vectors down
+ to N vectors.  This case doesn't generalize beyond 2 since
+ VEC_PERM_EXPR only takes 2 inputs.  */
+  bool pack_p = false;
+  /* If we're permuting inputs of N vectors each into X*N outputs,
+ this is the value of X, otherwise it is 1.  */
+  unsigned int unpack_factor = 1;
   tree op_vectype = NULL_TREE;
   FOR_EACH_VEC_ELT (children, i, child)
 if (SLP_TREE_VECTYPE (child))
@@ -10215,7 +10222,20 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, 
gimple_stmt_iterator *gsi,
 "Unsupported vector types in lane permutation\n");
  return -1;
}
-  if (SLP_TREE_LANES (child) != SLP_TREE_LANES (node))
+  auto op_nunits = TYPE_VECTOR_SUBPARTS (op_vectype);
+  unsigned int this_unpack_factor;
+  /* Check whether the input has twice as many lanes per vector.  */
+  if (children.length () == 1
+ && known_eq (SLP_TREE_LANES (child) * nunits,
+  SLP_TREE_LANES (node) * op_nunits * 2))
+   pack_p = true;
+  /* Check whether the output has N times as many lanes per vector.  */
+  else if (constant_multiple_p (SLP_TREE_LANES (node) * op_nunits,
+   SLP_TREE_LANES (child) * nunits,
+   &this_unpack_factor)
+  && (i == 0 || unpack_factor == this_unpack_factor))
+   unpack_factor = this_unpack_factor;
+  else
repeating_p = false;
 }
 
@@ -10243,29 +10263,47 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, 
gimple_stmt_iterator *gsi,
   return 1;
 }
 
-  /* REPEATING_P is true if every output vector is guaranteed to use the
- same permute vector.  We can han

[PATCH 4/4] vect: Add more dump messages for VLA SLP permutation

2024-10-04 Thread Richard Sandiford

Taking the !repeating_p route for VLA vectors causes analysis
to fail, but it wasn't clear from the dump files when this
had happened, and which node caused it.

gcc/
PR tree-optimization/116583
* tree-vect-slp.cc (vectorizable_slp_permutation_1): Add more
dump messages.
---
 gcc/tree-vect-slp.cc | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 66f5906ebb9..56fb55cb628 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -10319,10 +10319,22 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, 
gimple_stmt_iterator *gsi,
 instead of relying on the pattern described above.  */
   if (!nunits.is_constant (&npatterns)
  || !TYPE_VECTOR_SUBPARTS (op_vectype).is_constant ())
-   return -1;
+   {
+ if (dump_p)
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"unsupported permutation %p on variable-length"
+" vectors\n", (void *) node);
+ return -1;
+   }
   nelts_per_pattern = ncopies = 1;
   if (linfo && !LOOP_VINFO_VECT_FACTOR (linfo).is_constant (&ncopies))
-   return -1;
+   {
+ if (dump_p)
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"unsupported permutation %p for variable VF\n",
+(void *) node);
+ return -1;
+   }
   pack_p = false;
   unpack_factor = 1;
 }
-- 
2.25.1

Re: [PATCH] libstdc++: Unroll loop in load_bytes function

2024-10-04 Thread Jonathan Wakely

On Fri, 4 Oct 2024 at 10:19, Jonathan Wakely  wrote:
>
> On Fri, 4 Oct 2024 at 07:53, Richard Biener  
> wrote:
> >
> > On Wed, Oct 2, 2024 at 8:26 PM Jonathan Wakely  wrote:
> > >
> > > On Wed, 2 Oct 2024 at 19:16, Jonathan Wakely  wrote:
> > > >
> > > > On Wed, 2 Oct 2024 at 19:15, Dmitry Ilvokhin  wrote:
> > > > >
> > > > > Instead of looping over every byte of the tail, unroll loop manually
> > > > > using switch statement, then compilers (at least GCC and Clang) will
> > > > > generate a jump table [1], which is faster on a microbenchmark [2].
> > > > >
> > > > > [1]: https://godbolt.org/z/aE8Mq3j5G
> > > > > [2]: https://quick-bench.com/q/ylYLW2R22AZKRvameYYtbYxag24
> > > > >
> > > > > libstdc++-v3/ChangeLog:
> > > > >
> > > > > * libstdc++-v3/libsupc++/hash_bytes.cc (load_bytes): unroll
> > > > >   loop using switch statement.
> > > > >
> > > > > Signed-off-by: Dmitry Ilvokhin 
> > > > > ---
> > > > >  libstdc++-v3/libsupc++/hash_bytes.cc | 27 +++
> > > > >  1 file changed, 23 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/libstdc++-v3/libsupc++/hash_bytes.cc 
> > > > > b/libstdc++-v3/libsupc++/hash_bytes.cc
> > > > > index 3665375096a..294a7323dd0 100644
> > > > > --- a/libstdc++-v3/libsupc++/hash_bytes.cc
> > > > > +++ b/libstdc++-v3/libsupc++/hash_bytes.cc
> > > > > @@ -50,10 +50,29 @@ namespace
> > > > >load_bytes(const char* p, int n)
> > > > >{
> > > > >  std::size_t result = 0;
> > > > > ---n;
> > > > > -do
> > > > > -  result = (result << 8) + static_cast(p[n]);
> > > > > -while (--n >= 0);
> > > >
> > > > Don't we still need to loop, for the case where n >= 8? Otherwise we
> > > > only hash the first 8 bytes.
> > >
> > > Ah, but it's only ever called with load_bytes(end, len & 0x7)
> >
> > The compiler should do such transforms - you probably want to tell
> > it that n < 8 though, it likely doesn't (always) know.
>
> e.g. like this?
>
> if ((n & 7) != n)
>   __builtin_unreachable();
>
> For the microbenchmark that seems to make things consistently worse:
> https://quick-bench.com/q/2yCEqzFS8R8ueJ0-Gs-sZ6uWWEw

Oh actually in the benchmark I used (!(1 <= n && n < 8)) because 1 <=
n is always true too.

Re: [PATCH] libstdc++: Unroll loop in load_bytes function

2024-10-04 Thread Jonathan Wakely

On Fri, 4 Oct 2024 at 07:53, Richard Biener  wrote:
>
> On Wed, Oct 2, 2024 at 8:26 PM Jonathan Wakely  wrote:
> >
> > On Wed, 2 Oct 2024 at 19:16, Jonathan Wakely  wrote:
> > >
> > > On Wed, 2 Oct 2024 at 19:15, Dmitry Ilvokhin  wrote:
> > > >
> > > > Instead of looping over every byte of the tail, unroll loop manually
> > > > using switch statement, then compilers (at least GCC and Clang) will
> > > > generate a jump table [1], which is faster on a microbenchmark [2].
> > > >
> > > > [1]: https://godbolt.org/z/aE8Mq3j5G
> > > > [2]: https://quick-bench.com/q/ylYLW2R22AZKRvameYYtbYxag24
> > > >
> > > > libstdc++-v3/ChangeLog:
> > > >
> > > > * libstdc++-v3/libsupc++/hash_bytes.cc (load_bytes): unroll
> > > >   loop using switch statement.
> > > >
> > > > Signed-off-by: Dmitry Ilvokhin 
> > > > ---
> > > >  libstdc++-v3/libsupc++/hash_bytes.cc | 27 +++
> > > >  1 file changed, 23 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/libstdc++-v3/libsupc++/hash_bytes.cc 
> > > > b/libstdc++-v3/libsupc++/hash_bytes.cc
> > > > index 3665375096a..294a7323dd0 100644
> > > > --- a/libstdc++-v3/libsupc++/hash_bytes.cc
> > > > +++ b/libstdc++-v3/libsupc++/hash_bytes.cc
> > > > @@ -50,10 +50,29 @@ namespace
> > > >load_bytes(const char* p, int n)
> > > >{
> > > >  std::size_t result = 0;
> > > > ---n;
> > > > -do
> > > > -  result = (result << 8) + static_cast(p[n]);
> > > > -while (--n >= 0);
> > >
> > > Don't we still need to loop, for the case where n >= 8? Otherwise we
> > > only hash the first 8 bytes.
> >
> > Ah, but it's only ever called with load_bytes(end, len & 0x7)
>
> The compiler should do such transforms - you probably want to tell
> it that n < 8 though, it likely doesn't (always) know.

e.g. like this?

if ((n & 7) != n)
  __builtin_unreachable();

For the microbenchmark that seems to make things consistently worse:
https://quick-bench.com/q/2yCEqzFS8R8ueJ0-Gs-sZ6uWWEw

Re: [PATCH] middle-end: reorder masking priority of math functions

2024-10-04 Thread Victor Do Nascimento

On 10/4/24 09:32, Tamar Christina wrote:

Hi Victor,

-Original Message-
From: Victor Do Nascimento 
Sent: Wednesday, October 2, 2024 5:26 PM
To: gcc-patches@gcc.gnu.org
Cc: Tamar Christina ; richard.guent...@gmail.com;
Victor Do Nascimento 
Subject: [PATCH] middle-end: reorder masking priority of math functions

Given the categorization of math built-in functions as `ECF_CONST',
when if-converting their uses, their calls are not masked and are thus
called with an all-true predicate.

This, however, is not appropriate where built-ins have library
equivalents, wherein they may exhibit highly architecture-specific
behaviors. For example, vectorized implementations may delegate the
computation of values outside a certain acceptable numerical range to
special (non-vectorized) routines which considerably slow down
computation.

As numerical simulation programs often do bounds check on input values
prior to math calls, conditionally assigning default output values for
out-of-bounds input and skipping the math call altogether, these
fallback implementations should seldom be called in the execution of
vectorized code.  If, however, we don't apply any masking to these
math functions, we end up effectively executing both if and else
branches for these values, leading to considerable performance
degradation on scientific workloads.

We therefore invert the order of handling of math function calls in
`if_convertible_stmt_p' to prioritize the handling of their
library-provided implementations over the equivalent internal function.

I think this makes sense to me from a technical standpoint and from an SVE
one.  Though I think the original order may have been there because of the
assumption that on some uarches unpredicated implementations are faster than
predicated ones.

So there may be some concerns about this order being slower for some.
I'll leave it up to Richi since e.g. I don't know the perf characteristics of 
the
x86 variants here, but if there is a concern you could use the
conditional_operation_is_expensive target hook to decide on the preferred order.

But other than that the change itself looks good to be but you still need 
approval.

Cheers,
Tamar

Thank you very much for your input here, Tamar.

Yes, I do agree that this solution may well not be the best path forward 
for all architectures and that is something that has indeed crossed my 
mind before.

Nonetheless, I did think that the best way to get further feedback on 
the matter was to present this initial proposal to which others could 
respond as they saw fit regarding the performance characteristics in 
other architectures.

Let's see what Richi has to say. If necessary we can, as you rightly 
suggested, resort to the use of the `conditional_operation_is_expensive' 
target hook.

Many thanks once again,
Victor

Regression tested on aarch64-none-linux-gnu & x86_64-linux-gnu w/ no
new regressions.

gcc/ChangeLog:

* tree-if-conv.cc (if_convertible_stmt_p): Check for explicit
function declaration before IFN fallback.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-fncall-mask-math.c: New.
---
  .../gcc.dg/vect/vect-fncall-mask-math.c   | 33 +++
  gcc/tree-if-conv.cc   | 18 +-
  2 files changed, 42 insertions(+), 9 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
new file mode 100644
index 000..15e22da2807
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
@@ -0,0 +1,33 @@
+/* Test the correct application of masking to autovectorized math function 
calls.
+   Test is currently set to xfail pending the release of the relevant lmvec
+   support. */
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-additional-options "-march=armv8.2-a+sve -fdump-tree-ifcvt-raw -Ofast"
{ target { aarch64*-*-* } } } */
+
+#include 
+
+const int N = 20;
+const float lim = 101.0;
+const float cst =  -1.0;
+float tot =   0.0;
+
+float b[20];
+float a[20] = { [0 ... 9] = 1.7014118e39, /* If branch. */
+   [10 ... 19] = 100.0 };/* Else branch.  */
+
+int main (void)
+{
+  #pragma omp simd
+  for (int i = 0; i < N; i += 1)
+{
+  if (a[i] > lim)
+   b[i] = cst;
+  else
+   b[i] = expf (a[i]);
+  tot += b[i];
+}
+  return (0);
+}
+
+/* { dg-final { scan-tree-dump-not { gimple_call } ifcvt { xfail 
{
aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump { gimple_call <.MASK_CALL, _2, expf, _1, _30>} 
ifcvt
{ xfail { aarch64*-*-* } } } } */
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 3b04d1e8d34..90c754a4814 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -1133,15 +1133,6 @@ if_convertible_stmt_p (gimple *stmt,
vec refs)

  case GIMPLE_CALL:
{
-   /* There are some IFN_s that are used to replace builtins but have

Re: [PATCH 3/3] aarch64: libgcc: Add -Werror support

2024-10-04 Thread Kyrylo Tkachov



> On 3 Oct 2024, at 21:44, Christophe Lyon  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> When --enable-werror is enabled when running the top-level configure,
> it passes --enable-werror-always to subdirs.  Some of them, like
> libgcc, ignore it.
> 
> This patch adds support for it, enabled only for aarch64, to avoid
> breaking bootstrap for other targets.
> 

The aarch64 part is ok but you’ll need a wider libgcc approval.
It seems to me that if libgcc is intended to compile cleanly with -Werror then 
it should be a libgcc-wide change, but maybe doing it port-by-port is the only 
practical way of getting there?
Thanks,
Kyrill


> The patch also adds -Wno-prio-ctor-dtor to avoid a warning when compiling 
> lse_init.c
> 
>libgcc/
>* Makefile.in (WERROR): New.
>* config/aarch64/t-aarch64: Handle WERROR. Always use
>-Wno-prio-ctor-dtor.
>* configure.ac: Add support for --enable-werror-always.
>* configure: Regenerate.
> ---
> libgcc/Makefile.in  |  1 +
> libgcc/config/aarch64/t-aarch64 |  1 +
> libgcc/configure| 31 +++
> libgcc/configure.ac |  5 +
> 4 files changed, 38 insertions(+)
> 
> diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
> index 0e46e9ef768..eca62546642 100644
> --- a/libgcc/Makefile.in
> +++ b/libgcc/Makefile.in
> @@ -84,6 +84,7 @@ AR_FLAGS = rc
> 
> CC = @CC@
> CFLAGS = @CFLAGS@
> +WERROR = @WERROR@
> RANLIB = @RANLIB@
> LN_S = @LN_S@
> 
> diff --git a/libgcc/config/aarch64/t-aarch64 b/libgcc/config/aarch64/t-aarch64
> index b70e7b94edd..ae1588ce307 100644
> --- a/libgcc/config/aarch64/t-aarch64
> +++ b/libgcc/config/aarch64/t-aarch64
> @@ -30,3 +30,4 @@ LIB2ADDEH += \
>$(srcdir)/config/aarch64/__arm_za_disable.S
> 
> SHLIB_MAPFILES += $(srcdir)/config/aarch64/libgcc-sme.ver
> +LIBGCC2_CFLAGS += $(WERROR) -Wno-prio-ctor-dtor
> diff --git a/libgcc/configure b/libgcc/configure
> index cff1eff9625..ae56f7dbdc9 100755
> --- a/libgcc/configure
> +++ b/libgcc/configure
> @@ -592,6 +592,7 @@ enable_execute_stack
> asm_hidden_op
> extra_parts
> cpu_type
> +WERROR
> get_gcc_base_ver
> HAVE_STRUB_SUPPORT
> thread_header
> @@ -719,6 +720,7 @@ enable_tm_clone_registry
> with_glibc_version
> enable_tls
> with_gcc_major_version_only
> +enable_werror_always
> '
>   ac_precious_vars='build_alias
> host_alias
> @@ -1361,6 +1363,7 @@ Optional Features:
>   installations without PT_GNU_EH_FRAME support
>   --disable-tm-clone-registrydisable TM clone registry
>   --enable-tlsUse thread-local storage [default=yes]
> +  --enable-werror-always  enable -Werror despite compiler version
> 
> Optional Packages:
>   --with-PACKAGE[=ARG]use PACKAGE [ARG=yes]
> @@ -5808,6 +5811,34 @@ fi
> 
> 
> 
> +# Only enable with --enable-werror-always until existing warnings are
> +# corrected.
> +ac_ext=c
> +ac_cpp='$CPP $CPPFLAGS'
> +ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
> +ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS 
> conftest.$ac_ext $LIBS >&5'
> +ac_compiler_gnu=$ac_cv_c_compiler_gnu
> +
> +WERROR=
> +# Check whether --enable-werror-always was given.
> +if test "${enable_werror_always+set}" = set; then :
> +  enableval=$enable_werror_always;
> +else
> +  enable_werror_always=no
> +fi
> +
> +if test $enable_werror_always = yes; then :
> +  WERROR="$WERROR${WERROR:+ }-Werror"
> +fi
> +
> +ac_ext=c
> +ac_cpp='$CPP $CPPFLAGS'
> +ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
> +ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS 
> conftest.$ac_ext $LIBS >&5'
> +ac_compiler_gnu=$ac_cv_c_compiler_gnu
> +
> +
> +
> # Substitute configuration variables
> 
> 
> diff --git a/libgcc/configure.ac b/libgcc/configure.ac
> index 4e8c036990f..6b3ea2aea5c 100644
> --- a/libgcc/configure.ac
> +++ b/libgcc/configure.ac
> @@ -13,6 +13,7 @@ sinclude(../config/unwind_ipinfo.m4)
> sinclude(../config/gthr.m4)
> sinclude(../config/sjlj.m4)
> sinclude(../config/cet.m4)
> +sinclude(../config/warnings.m4)
> 
> AC_INIT([GNU C Runtime Library], 1.0,,[libgcc])
> AC_CONFIG_SRCDIR([static-object.mk])
> @@ -746,6 +747,10 @@ AC_SUBST(HAVE_STRUB_SUPPORT)
> # Determine what GCC version number to use in filesystem paths.
> GCC_BASE_VER
> 
> +# Only enable with --enable-werror-always until existing warnings are
> +# corrected.
> +ACX_PROG_CC_WARNINGS_ARE_ERRORS([manual])
> +
> # Substitute configuration variables
> AC_SUBST(cpu_type)
> AC_SUBST(extra_parts)
> --
> 2.34.1
>

Re: [PATCH 0/2] aarch64: remove SVE2 requirement from SME and diagnose it as unsupported

2024-10-04 Thread Kyrylo Tkachov

Hi Andre,

> On 2 Oct 2024, at 19:13, Andre Vieira  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> This patch series removes the requirement of SVE2 for SME, so when a user
> passes +sme, SVE2 is not enabled as a result of that.
> We do this to be compliant with the ISA and behave in a compatible manner to
> other toolchains, to prevent unexpected behavior when switching between them.
> 
> However, for the time being we diagnose the use of SME without SVE2 as
> unsupported, we suspect that the backend correctly enables and disables the
> right instructions given the options, but we believe that for certain codegen
> there are assumptions that SVE & SVE2 is present when using SME.  Before we
> fully support this combination we should investigate these.

Is that something you intend to do for GCC 15.1?
I’m not a fan of the warning in patch [2/2].
If the compiler is at risk of crashing, generating wrong code, or emitting SVE 
code in non-streaming regions or other such violations then we should mark it 
as unsupported with an error.
Usually diagnostics about “I could support it but I don’t” use the sorry () API 
for this reason.

Thanks,
Kyrill

> 
> The patch series also refactors the FCMA/COMPNUM/TARGET_COMPLEX feature to
> separate it from Armv8.3-A feature set.
> 
> Andre Vieira (2)
> aarch64: Split FCMA feature bit from Armv8.3-A
> aarch64: remove SVE2 requirement from SME and diagnose it as unsupported
> 
> Regression tested on aarch64-none-linux-gnu.
> 
> OK for trunk?
> 
> Andre Vieira (2):
>  aarch64: Split FCMA feature bit from Armv8.3-A
>  aarch64: remove SVE2 requirement from SME and diagnose it as unsupported
> 
> gcc/config/aarch64/aarch64-arches.def | 2 +-
> gcc/config/aarch64/aarch64-option-extensions.def  | 4 +++-
> gcc/config/aarch64/aarch64.cc | 4 
> gcc/config/aarch64/aarch64.h  | 2 +-
> .../aarch64/sve/acle/general-c/binary_int_opt_single_n_2.c| 2 +-
> .../aarch64/sve/acle/general-c/binary_opt_single_n_2.c| 2 +-
> .../gcc.target/aarch64/sve/acle/general-c/binary_single_1.c   | 2 +-
> .../gcc.target/aarch64/sve/acle/general-c/binaryxn_2.c| 2 +-
> gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/clamp_1.c | 2 +-
> .../aarch64/sve/acle/general-c/compare_scalar_count_1.c   | 2 +-
> .../aarch64/sve/acle/general-c/shift_right_imm_narrowxn_1.c   | 2 +-
> .../gcc.target/aarch64/sve/acle/general-c/storexn_1.c | 2 +-
> .../aarch64/sve/acle/general-c/ternary_qq_or_011_lane_1.c | 2 +-
> .../gcc.target/aarch64/sve/acle/general-c/unary_convertxn_1.c | 2 +-
> .../gcc.target/aarch64/sve/acle/general-c/unaryxn_1.c | 2 +-
> 15 files changed, 20 insertions(+), 14 deletions(-)
> 
> --
> 2.25.1
>

Re: [PATCH] c++/modules: Merge default arguments [PR99274]

2024-10-04 Thread Nathaniel Shead

Ping for https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660134.html.

On Thu, Sep 12, 2024 at 01:35:38PM -0400, Patrick Palka wrote:
> On Fri, 23 Aug 2024, Nathaniel Shead wrote:
> 
> > On Thu, Aug 22, 2024 at 02:20:14PM -0400, Patrick Palka wrote:
> > > On Mon, 12 Aug 2024, Nathaniel Shead wrote:
> > > 
> > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> > > > 
> > > > I tried to implement a remapping of the slots for TARGET_EXPRs for the
> > > > FIXME but I wasn't able to work out how to do so effectively.  Given
> > > > that I doubt this will be a common issue I felt probably easiest to
> > > > leave it for now and focus on other issues in the meantime; thoughts?
> > > > 
> > > > The other thing to note is that most of this function just has a single
> > > > error message always indicated by a 'goto mismatch;' but I felt that it
> > > > seemed reasonable to provide more specific error messages where we can.
> > > > But given that in the long term we probably want to replace this
> > > > function with an appropriately enhanced 'duplicate_decls' anyway maybe
> > > > it's not worth worrying about; this patch is still useful in the
> > > > meantime if only for the testcases, I hope.
> > > > 
> > > > -- >8 --
> > > > 
> > > > When merging a newly imported declaration with an existing declaration
> > > > we don't currently propagate new default arguments, which causes issues
> > > > when modularising header units.  This patch adds logic to propagate
> > > > default arguments to existing declarations on import, and error if the
> > > > defaults do not match.
> > > > 
> > > > PR c++/99274
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > > * module.cc (trees_in::is_matching_decl): Merge default
> > > > arguments.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > > * g++.dg/modules/default-arg-1_a.H: New test.
> > > > * g++.dg/modules/default-arg-1_b.C: New test.
> > > > * g++.dg/modules/default-arg-2_a.H: New test.
> > > > * g++.dg/modules/default-arg-2_b.C: New test.
> > > > * g++.dg/modules/default-arg-3.h: New test.
> > > > * g++.dg/modules/default-arg-3_a.H: New test.
> > > > * g++.dg/modules/default-arg-3_b.C: New test.
> > > > 
> > > > Signed-off-by: Nathaniel Shead 
> > > > ---
> > > >  gcc/cp/module.cc  | 62 ++-
> > > >  .../g++.dg/modules/default-arg-1_a.H  | 17 +
> > > >  .../g++.dg/modules/default-arg-1_b.C  | 26 
> > > >  .../g++.dg/modules/default-arg-2_a.H  | 17 +
> > > >  .../g++.dg/modules/default-arg-2_b.C  | 28 +
> > > >  gcc/testsuite/g++.dg/modules/default-arg-3.h  | 13 
> > > >  .../g++.dg/modules/default-arg-3_a.H  |  5 ++
> > > >  .../g++.dg/modules/default-arg-3_b.C  |  6 ++
> > > >  8 files changed, 171 insertions(+), 3 deletions(-)
> > > >  create mode 100644 gcc/testsuite/g++.dg/modules/default-arg-1_a.H
> > > >  create mode 100644 gcc/testsuite/g++.dg/modules/default-arg-1_b.C
> > > >  create mode 100644 gcc/testsuite/g++.dg/modules/default-arg-2_a.H
> > > >  create mode 100644 gcc/testsuite/g++.dg/modules/default-arg-2_b.C
> > > >  create mode 100644 gcc/testsuite/g++.dg/modules/default-arg-3.h
> > > >  create mode 100644 gcc/testsuite/g++.dg/modules/default-arg-3_a.H
> > > >  create mode 100644 gcc/testsuite/g++.dg/modules/default-arg-3_b.C
> > > > 
> > > > diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> > > > index f4d137b13a1..87f34bac578 100644
> > > > --- a/gcc/cp/module.cc
> > > > +++ b/gcc/cp/module.cc
> > > > @@ -11551,8 +11551,6 @@ trees_in::is_matching_decl (tree existing, tree 
> > > > decl, bool is_typedef)
> > > >  
> > > >   if (!same_type_p (TREE_VALUE (d_args), TREE_VALUE (e_args)))
> > > > goto mismatch;
> > > > -
> > > > - // FIXME: Check default values
> > > > }
> > > >  
> > > >/* If EXISTING has an undeduced or uninstantiated exception
> > > > @@ -11690,7 +11688,65 @@ trees_in::is_matching_decl (tree existing, 
> > > > tree decl, bool is_typedef)
> > > >if (!DECL_EXTERNAL (d_inner))
> > > >  DECL_EXTERNAL (e_inner) = false;
> > > >  
> > > > -  // FIXME: Check default tmpl and fn parms here
> > > > +  if (TREE_CODE (decl) == TEMPLATE_DECL)
> > > > +{
> > > > +  /* Merge default template arguments.  */
> > > > +  tree d_parms = DECL_INNERMOST_TEMPLATE_PARMS (decl);
> > > > +  tree e_parms = DECL_INNERMOST_TEMPLATE_PARMS (existing);
> > > > +  gcc_checking_assert (TREE_VEC_LENGTH (d_parms)
> > > > +  == TREE_VEC_LENGTH (e_parms));
> > > > +  for (int i = 0; i < TREE_VEC_LENGTH (d_parms); ++i)
> > > > +   {
> > > > + tree d_default = TREE_PURPOSE (TREE_VEC_ELT (d_parms, i));
> > > > + tree& e_default = TREE_PURPOSE (TREE_VEC_ELT (e_parms, i));
> > > > + if (e_default == NULL_TREE)
> > > > +   e_d

Re: [PATCH v5] gcc, libcpp: Add warning switch for "#pragma once in main file" [PR89808]

2024-10-04 Thread Ken Matsui

Ping for -Wno-pragma-once-outside-header.

On Thursday, June 27th, 2024 at 11:00 AM, Ken Matsui 
 wrote:

> 
> 
> Ping.
> 
> 
> On Sat, Jun 15, 2024 at 10:30 PM Ken Matsui kmat...@gcc.gnu.org wrote:
> 
> > This patch adds a warning switch for "#pragma once in main file". The
> > warning option name is Wpragma-once-outside-header, which is the same
> > as Clang provides.
> > 
> > PR preprocessor/89808
> > 
> > gcc/c-family/ChangeLog:
> > 
> > * c.opt (Wpragma_once_outside_header): Define new option.
> > * c.opt.urls: Regenerate.
> > 
> > gcc/ChangeLog:
> > 
> > * doc/invoke.texi (Warning Options): Document
> > -Wno-pragma-once-outside-header.
> > 
> > libcpp/ChangeLog:
> > 
> > * include/cpplib.h (cpp_warning_reason): Define
> > CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER.
> > * directives.cc (do_pragma_once): Use
> > CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/warn/Wno-pragma-once-outside-header.C: New test.
> > * g++.dg/warn/Wpragma-once-outside-header.C: New test.
> > 
> > Signed-off-by: Ken Matsui kmat...@gcc.gnu.org
> > ---
> > gcc/c-family/c.opt | 4 
> > gcc/c-family/c.opt.urls | 3 +++
> > gcc/doc/invoke.texi | 10 --
> > .../g++.dg/warn/Wno-pragma-once-outside-header.C | 5 +
> > .../g++.dg/warn/Wpragma-once-outside-header.C | 6 ++
> > libcpp/directives.cc | 3 ++-
> > libcpp/include/cpplib.h | 3 ++-
> > 7 files changed, 30 insertions(+), 4 deletions(-)
> > create mode 100644 
> > gcc/testsuite/g++.dg/warn/Wno-pragma-once-outside-header.C
> > create mode 100644 gcc/testsuite/g++.dg/warn/Wpragma-once-outside-header.C
> > 
> > diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> > index 403abc1f26e..3439f36fe45 100644
> > --- a/gcc/c-family/c.opt
> > +++ b/gcc/c-family/c.opt
> > @@ -1188,6 +1188,10 @@ Wpragmas
> > C ObjC C++ ObjC++ Var(warn_pragmas) Init(1) Warning
> > Warn about misuses of pragmas.
> > 
> > +Wpragma-once-outside-header
> > +C ObjC C++ ObjC++ Var(warn_pragma_once_outside_header) 
> > CppReason(CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER) Init(1) Warning
> > +Warn about #pragma once outside of a header.
> > +
> > Wprio-ctor-dtor
> > C ObjC C++ ObjC++ Var(warn_prio_ctor_dtor) Init(1) Warning
> > Warn if constructor or destructors with priorities from 0 to 100 are used.
> > diff --git a/gcc/c-family/c.opt.urls b/gcc/c-family/c.opt.urls
> > index dd455d7c0dc..778ca08be2e 100644
> > --- a/gcc/c-family/c.opt.urls
> > +++ b/gcc/c-family/c.opt.urls
> > @@ -672,6 +672,9 @@ 
> > UrlSuffix(gcc/Warning-Options.html#index-Wno-pointer-to-int-cast)
> > Wpragmas
> > UrlSuffix(gcc/Warning-Options.html#index-Wno-pragmas)
> > 
> > +Wpragma-once-outside-header
> > +UrlSuffix(gcc/Warning-Options.html#index-Wno-pragma-once-outside-header)
> > +
> > Wprio-ctor-dtor
> > UrlSuffix(gcc/Warning-Options.html#index-Wno-prio-ctor-dtor)
> > 
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index 9456ced468a..c7f17ca9eb7 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -391,8 +391,8 @@ Objective-C and Objective-C++ Dialects}.
> > -Wpacked -Wno-packed-bitfield-compat -Wpacked-not-aligned -Wpadded
> > -Wparentheses -Wno-pedantic-ms-format
> > -Wpointer-arith -Wno-pointer-compare -Wno-pointer-to-int-cast
> > --Wno-pragmas -Wno-prio-ctor-dtor -Wredundant-decls
> > --Wrestrict -Wno-return-local-addr -Wreturn-type
> > +-Wno-pragmas -Wno-pragma-once-outside-header -Wno-prio-ctor-dtor
> > +-Wredundant-decls -Wrestrict -Wno-return-local-addr -Wreturn-type
> > -Wno-scalar-storage-order -Wsequence-point
> > -Wshadow -Wshadow=global -Wshadow=local -Wshadow=compatible-local
> > -Wno-shadow-ivar
> > @@ -7983,6 +7983,12 @@ Do not warn about misuses of pragmas, such as 
> > incorrect parameters,
> > invalid syntax, or conflicts between pragmas. See also
> > @option{-Wunknown-pragmas}.
> > 
> > +@opindex Wno-pragma-once-outside-header
> > +@opindex Wpragma-once-outside-header
> > +@item -Wno-pragma-once-outside-header
> > +Do not warn when @code{#pragma once} is used in a file that is not a header
> > +file, such as a main file.
> > +
> > @opindex Wno-prio-ctor-dtor
> > @opindex Wprio-ctor-dtor
> > @item -Wno-prio-ctor-dtor
> > diff --git a/gcc/testsuite/g++.dg/warn/Wno-pragma-once-outside-header.C 
> > b/gcc/testsuite/g++.dg/warn/Wno-pragma-once-outside-header.C
> > new file mode 100644
> > index 000..b5be4d25a9d
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/warn/Wno-pragma-once-outside-header.C
> > @@ -0,0 +1,5 @@
> > +// { dg-do assemble }
> > +// { dg-options "-Wno-pragma-once-outside-header" }
> > +
> > +#pragma once
> > +int main() {}
> > diff --git a/gcc/testsuite/g++.dg/warn/Wpragma-once-outside-header.C 
> > b/gcc/testsuite/g++.dg/warn/Wpragma-once-outside-header.C
> > new file mode 100644
> > index 000..29f09b69f71
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/warn/Wpragma-once-outside-header.C
> > @@ -0,0 +1,6 @@
> > +// { dg-do assemble }
> > +// { dg-options "-Werror=pragma-once-outside-header" }
> > +// { dg-message "some

Re: [PATCH 1/3] c++: Handle ABI for non-polymorphic dynamic classes

2024-10-04 Thread Nathaniel Shead

Ping for https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660956.html.

On Wed, Aug 21, 2024 at 09:38:44AM +1000, Nathaniel Shead wrote:
> Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> 
> -- >8 --
> 
> The Itanium ABI has specific rules for when virtual tables for dynamic
> classes should be emitted.  However we didn't consider structures with
> virtual inheritance but no virtual members as dynamic classes for ABI
> purposes; this patch fixes this.
> 
> gcc/cp/ChangeLog:
> 
>   * decl2.cc (import_export_class): Use TYPE_CONTAINS_VPTR_P
>   instead of TYPE_POLYMORPHIC_P.
>   (import_export_decl): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/modules/virt-5_a.C: New test.
>   * g++.dg/modules/virt-5_b.C: New test.
> 
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/decl2.cc |  4 ++--
>  gcc/testsuite/g++.dg/modules/virt-5_a.C | 16 
>  gcc/testsuite/g++.dg/modules/virt-5_b.C | 11 +++
>  3 files changed, 29 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/modules/virt-5_a.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/virt-5_b.C
> 
> diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
> index e9ae979896c..af544f40dac 100644
> --- a/gcc/cp/decl2.cc
> +++ b/gcc/cp/decl2.cc
> @@ -2431,7 +2431,7 @@ import_export_class (tree ctype)
> translation unit, then export the class; otherwise, import
> it.  */
>import_export = -1;
> -  else if (TYPE_POLYMORPHIC_P (ctype))
> +  else if (TYPE_CONTAINS_VPTR_P (ctype))
>  {
>tree cdecl = TYPE_NAME (ctype);
>if (DECL_LANG_SPECIFIC (cdecl) && DECL_MODULE_ATTACH_P (cdecl))
> @@ -3527,7 +3527,7 @@ import_export_decl (tree decl)
> class_type = type;
> import_export_class (type);
> if (CLASSTYPE_INTERFACE_KNOWN (type)
> -   && TYPE_POLYMORPHIC_P (type)
> +   && TYPE_CONTAINS_VPTR_P (type)
> && CLASSTYPE_INTERFACE_ONLY (type)
> /* If -fno-rtti was specified, then we cannot be sure
>that RTTI information will be emitted with the
> diff --git a/gcc/testsuite/g++.dg/modules/virt-5_a.C 
> b/gcc/testsuite/g++.dg/modules/virt-5_a.C
> new file mode 100644
> index 000..f4c6abe85ef
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/virt-5_a.C
> @@ -0,0 +1,16 @@
> +// { dg-additional-options "-fmodules-ts" }
> +// { dg-module-cmi M }
> +
> +export module M;
> +
> +struct C {};
> +struct B : virtual C {};
> +
> +// Despite no non-inline key function, this is still a dynamic class
> +// and so by the Itanium ABI 5.2.3 should be uniquely emitted in this TU
> +export struct A : B {
> +  inline A (int) {}
> +};
> +
> +// { dg-final { scan-assembler {_ZTTW1M1A:} } }
> +// { dg-final { scan-assembler {_ZTVW1M1A:} } }
> diff --git a/gcc/testsuite/g++.dg/modules/virt-5_b.C 
> b/gcc/testsuite/g++.dg/modules/virt-5_b.C
> new file mode 100644
> index 000..785dd92ac1e
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/virt-5_b.C
> @@ -0,0 +1,11 @@
> +// { dg-module-do link }
> +// { dg-additional-options "-fmodules-ts" }
> +
> +import M;
> +
> +int main() {
> +  A a(0);
> +}
> +
> +// { dg-final { scan-assembler-not {_ZTTW1M1A:} } }
> +// { dg-final { scan-assembler-not {_ZTVW1M1A:} } }
> -- 
> 2.43.2
>

Re: [PATCH 2/3] c++/modules: Prevent maybe_clone_decl being called multiple times [PR115007]

2024-10-04 Thread Nathaniel Shead

Ping for https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660957.html

On Wed, Aug 21, 2024 at 09:40:25AM +1000, Nathaniel Shead wrote:
> Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> 
> -- >8 --
> 
> The ICE in the linked PR is caused because maybe_clone_decl is not
> prepared to be called on a declaration that has already had clones
> created; what happens otherwise is that start_preparsed_function early
> exits and never sets up cfun, causing a segfault later on.
> 
> To fix this we ensure that post_load_processing only calls
> maybe_clone_decl if TREE_ASM_WRITTEN has not been marked on the
> declaration yet, and (if maybe_clone_decls succeeds) marks this flag on
> the decl so that it doesn't get called again later when finalising
> deferred vague linkage declarations in c_parse_final_cleanups.
> 
> As a bonus this now allows us to only keep the DECL_SAVED_TREE around in
> expand_or_defer_fn_1 for modules which have CMIs, which will have
> benefits for LTO performance in non-interface TUs.
> 
> For clarity we also update the streaming code to do post_load_decls for
> maybe in-charge cdtors rather than any DECL_ABSTRACT_P declaration, as
> this is more accurate to the decls affected by maybe_clone_body.
> 
>   PR c++/115007
> 
> gcc/cp/ChangeLog:
> 
>   * module.cc (module_state::read_cluster): Replace
>   DECL_ABSTRACT_P with DECL_MAYBE_IN_CHARGE_CDTOR_P.
>   (post_load_processing): Check and mark TREE_ASM_WRITTEN.
>   * semantics.cc (expand_or_defer_fn_1): Use the more specific
>   module_maybe_has_cmi_p instead of modules_p.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/modules/virt-6_a.C: New test.
>   * g++.dg/modules/virt-6_b.C: New test.
> 
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/module.cc|  7 ---
>  gcc/cp/semantics.cc |  2 +-
>  gcc/testsuite/g++.dg/modules/virt-6_a.C | 13 +
>  gcc/testsuite/g++.dg/modules/virt-6_b.C |  6 ++
>  4 files changed, 24 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/modules/virt-6_a.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/virt-6_b.C
> 
> diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> index 7c42aea05ee..5cd4f313933 100644
> --- a/gcc/cp/module.cc
> +++ b/gcc/cp/module.cc
> @@ -15525,7 +15525,7 @@ module_state::read_cluster (unsigned snum)
>  
>if (abstract)
>   ;
> -  else if (DECL_ABSTRACT_P (decl))
> +  else if (DECL_MAYBE_IN_CHARGE_CDTOR_P (decl))
>   vec_safe_push (post_load_decls, decl);
>else
>   {
> @@ -17947,10 +17947,11 @@ post_load_processing ()
>  
>dump () && dump ("Post-load processing of %N", decl);
>  
> -  gcc_checking_assert (DECL_ABSTRACT_P (decl));
> +  gcc_checking_assert (DECL_MAYBE_IN_CHARGE_CDTOR_P (decl));
>/* Cloning can cause loading -- specifically operator delete for
>the deleting dtor.  */
> -  maybe_clone_body (decl);
> +  if (!TREE_ASM_WRITTEN (decl) && maybe_clone_body (decl))
> + TREE_ASM_WRITTEN (decl) = 1;
>  }
>  
>cfun = old_cfun;
> diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
> index 5ab2076b673..f7ae8e68dcf 100644
> --- a/gcc/cp/semantics.cc
> +++ b/gcc/cp/semantics.cc
> @@ -5122,7 +5122,7 @@ expand_or_defer_fn_1 (tree fn)
>demand, so we also need to keep the body.  Otherwise we don't
>need it anymore.  */
>if (!DECL_DECLARED_CONSTEXPR_P (fn)
> -   && !(modules_p () && vague_linkage_p (fn)))
> +   && !(module_maybe_has_cmi_p () && vague_linkage_p (fn)))
>   DECL_SAVED_TREE (fn) = NULL_TREE;
>return false;
>  }
> diff --git a/gcc/testsuite/g++.dg/modules/virt-6_a.C 
> b/gcc/testsuite/g++.dg/modules/virt-6_a.C
> new file mode 100644
> index 000..68e466ace3f
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/virt-6_a.C
> @@ -0,0 +1,13 @@
> +// PR c++/115007
> +// { dg-additional-options "-fmodules-ts -Wno-global-module" }
> +// { dg-module-cmi M:a }
> +
> +module;
> +struct S {
> +  virtual ~S() = default;
> +  virtual void f() = 0;
> +};
> +module M:a;
> +extern S* p;
> +template  void format(T) { p->~S(); }
> +template void format(int);
> diff --git a/gcc/testsuite/g++.dg/modules/virt-6_b.C 
> b/gcc/testsuite/g++.dg/modules/virt-6_b.C
> new file mode 100644
> index 000..c53f5fac742
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/virt-6_b.C
> @@ -0,0 +1,6 @@
> +// PR c++/115007
> +// { dg-additional-options "-fmodules-ts" }
> +// { dg-module-cmi M }
> +
> +export module M;
> +import :a;
> -- 
> 2.43.2
>

Re: [PATCH 3/3] c++/modules: Support decloned cdtors

2024-10-04 Thread Nathaniel Shead

Ping for https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660958.html

On Wed, Aug 21, 2024 at 09:41:31AM +1000, Nathaniel Shead wrote:
> Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> 
> -- >8 --
> 
> When compiling with '-fdeclone-ctor-dtor' (enabled by default with -Os),
> we run into issues where we don't correctly emit the underlying
> functions.  We also need to ensure that COMDAT constructors are marked
> as such before 'maybe_clone_body' attempts to propagate COMDAT groups to
> the new thunks.
> 
> gcc/cp/ChangeLog:
> 
>   * module.cc (post_load_processing): Mark COMDAT as needed, emit
>   declarations if maybe_clone_body fails.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/modules/clone-2_a.C: New test.
>   * g++.dg/modules/clone-2_b.C: New test.
>   * g++.dg/modules/clone-3_a.C: New test.
>   * g++.dg/modules/clone-3_b.C: New test.
> 
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/module.cc | 20 
>  gcc/testsuite/g++.dg/modules/clone-2_a.C |  7 +++
>  gcc/testsuite/g++.dg/modules/clone-2_b.C |  5 +
>  gcc/testsuite/g++.dg/modules/clone-3_a.C |  9 +
>  gcc/testsuite/g++.dg/modules/clone-3_b.C |  8 
>  5 files changed, 45 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/modules/clone-2_a.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/clone-2_b.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/clone-3_a.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/clone-3_b.C
> 
> diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> index 5cd4f313933..9a9c0fdfe81 100644
> --- a/gcc/cp/module.cc
> +++ b/gcc/cp/module.cc
> @@ -17948,10 +17948,22 @@ post_load_processing ()
>dump () && dump ("Post-load processing of %N", decl);
>  
>gcc_checking_assert (DECL_MAYBE_IN_CHARGE_CDTOR_P (decl));
> -  /* Cloning can cause loading -- specifically operator delete for
> -  the deleting dtor.  */
> -  if (!TREE_ASM_WRITTEN (decl) && maybe_clone_body (decl))
> - TREE_ASM_WRITTEN (decl) = 1;
> +
> +  if (DECL_COMDAT (decl))
> + comdat_linkage (decl);
> +  if (!TREE_ASM_WRITTEN (decl))
> + {
> +   /* Cloning can cause loading -- specifically operator delete for
> +  the deleting dtor.  */
> +   if (maybe_clone_body (decl))
> + TREE_ASM_WRITTEN (decl) = 1;
> +   else
> + {
> +   /* We didn't clone the cdtor, make sure we emit it.  */
> +   note_vague_linkage_fn (decl);
> +   cgraph_node::finalize_function (decl, true);
> + }
> + }
>  }
>  
>cfun = old_cfun;
> diff --git a/gcc/testsuite/g++.dg/modules/clone-2_a.C 
> b/gcc/testsuite/g++.dg/modules/clone-2_a.C
> new file mode 100644
> index 000..47e21581fdc
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/clone-2_a.C
> @@ -0,0 +1,7 @@
> +// { dg-additional-options "-fmodules-ts -fdeclone-ctor-dtor" }
> +// { dg-module-cmi M }
> +
> +export module M;
> +export struct S {
> +  inline S(int) {}
> +};
> diff --git a/gcc/testsuite/g++.dg/modules/clone-2_b.C 
> b/gcc/testsuite/g++.dg/modules/clone-2_b.C
> new file mode 100644
> index 000..80c1e149518
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/clone-2_b.C
> @@ -0,0 +1,5 @@
> +// { dg-additional-options "-fmodules-ts -fdeclone-ctor-dtor" }
> +
> +import M;
> +
> +S s(0);
> diff --git a/gcc/testsuite/g++.dg/modules/clone-3_a.C 
> b/gcc/testsuite/g++.dg/modules/clone-3_a.C
> new file mode 100644
> index 000..87de746f5c2
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/clone-3_a.C
> @@ -0,0 +1,9 @@
> +// { dg-additional-options "-fmodules-ts -fdeclone-ctor-dtor" }
> +// { dg-module-cmi M }
> +
> +export module M;
> +
> +struct A {};
> +export struct B : virtual A {
> +  inline B (int) {}
> +};
> diff --git a/gcc/testsuite/g++.dg/modules/clone-3_b.C 
> b/gcc/testsuite/g++.dg/modules/clone-3_b.C
> new file mode 100644
> index 000..23c9ac4a804
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/clone-3_b.C
> @@ -0,0 +1,8 @@
> +// { dg-module-do link }
> +// { dg-additional-options "-fmodules-ts -fdeclone-ctor-dtor" }
> +
> +import M;
> +
> +int main() {
> +  B b(0);
> +}
> -- 
> 2.43.2
>

Re: [PATCH 2/2] aarch64: remove SVE2 requirement from SME and diagnose it as unsupported

2024-10-04 Thread Kyrylo Tkachov

Hi Andre,

> On 2 Oct 2024, at 19:13, Andre Vieira  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> As per the AArch64 ISA FEAT_SME does not require FEAT_SVE2, so we are removing
> that false dependency in GCC.  However, we chose for now to not support this
> combination of features and will diagnose the combination of FEAT_SME without
> FEAT_SVE2 as unsupported by GCC.  We may choose to support this in the future.
> 
> gcc/ChangeLog:
> 
>* config/aarch64/aarch64-arches.def (SME): Remove SVE2 as prerequisite
>and add in FCMA and F16FML.
>* config/aarch64/aarch64.cc (aarch64_override_options): Diagnose use of
>SME without SVE2.
> 
> gcc/testsuite/ChangeLog:
> 
>* gcc.target/aarch64/sve/acle/general-c/binary_int_opt_single_n_2.c:
>Pass +sve2 to existing +sme pragma.
>* gcc.target/aarch64/sve/acle/general-c/binary_opt_single_n_2.c:
>Likewise.
>* gcc.target/aarch64/sve/acle/general-c/binary_single_1.c: Likewise.
>* gcc.target/aarch64/sve/acle/general-c/binaryxn_2.c: Likewise.
>* gcc.target/aarch64/sve/acle/general-c/clamp_1.c: Likewise.
>* gcc.target/aarch64/sve/acle/general-c/compare_scalar_count_1.c:
>Likewise.
>* gcc.target/aarch64/sve/acle/general-c/shift_right_imm_narrowxn_1.c:
>Likewise.
>* gcc.target/aarch64/sve/acle/general-c/storexn_1.c: Likewise.
>* gcc.target/aarch64/sve/acle/general-c/ternary_qq_or_011_lane_1.c:
>Likewise.
>* gcc.target/aarch64/sve/acle/general-c/unary_convertxn_1.c: Likewise.
>* gcc.target/aarch64/sve/acle/general-c/unaryxn_1.c: Likewise.
> ---
> gcc/config/aarch64/aarch64-option-extensions.def  | 3 ++-
> gcc/config/aarch64/aarch64.cc | 4 
> .../aarch64/sve/acle/general-c/binary_int_opt_single_n_2.c| 2 +-
> .../aarch64/sve/acle/general-c/binary_opt_single_n_2.c| 2 +-
> .../gcc.target/aarch64/sve/acle/general-c/binary_single_1.c   | 2 +-
> .../gcc.target/aarch64/sve/acle/general-c/binaryxn_2.c| 2 +-
> gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/clamp_1.c | 2 +-
> .../aarch64/sve/acle/general-c/compare_scalar_count_1.c   | 2 +-
> .../aarch64/sve/acle/general-c/shift_right_imm_narrowxn_1.c   | 2 +-
> .../gcc.target/aarch64/sve/acle/general-c/storexn_1.c | 2 +-
> .../aarch64/sve/acle/general-c/ternary_qq_or_011_lane_1.c | 2 +-
> .../gcc.target/aarch64/sve/acle/general-c/unary_convertxn_1.c | 2 +-
> .../gcc.target/aarch64/sve/acle/general-c/unaryxn_1.c | 2 +-
> 13 files changed, 17 insertions(+), 12 deletions(-)
> 

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 68913beaee2..bc2023da180 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18998,6 +18998,10 @@ aarch64_override_options (void)
  while processing functions with potential target attributes.  */
   target_option_default_node = target_option_current_node
 = build_target_option_node (&global_options, &global_options_set);
+
+  if (TARGET_SME && !TARGET_SVE2)
+warning (0, "this gcc version does not guarantee full support for +sme"
+ " without +sve2");
 }

Beyond my comments on the cover letter, if you do intend to give some message 
here anyway, this can be more fancy :)
You can use %qs to quote the +sme and +sve2 strings and I don’t think we 
usually refer to GCC itself from warnings.
I think a passive voice would fit better.

Regardless of what we do for the warning this restriction should be documented 
in doc/invoke.texi if we end up having it for the GCC 15.1 release.
Thanks,
Kyrill

[PATCH] ssa-math-opts, i386: Improve spaceship expansion [PR116896]

2024-10-04 Thread Jakub Jelinek

Hi!

The PR notes that we don't emit optimal code for C++ spaceship
operator if the result is returned as an integer rather than the
result just being compared against different values and different
code executed based on that.
So e.g. for
template 
auto foo (T x, T y) { return x <=> y; }
for both floating point types, signed integer types and unsigned integer
types.  auto in that case is std::strong_ordering or std::partial_ordering,
which are fancy C++ abstractions around struct with signed char member
which is -1, 0, 1 for the strong ordering and -1, 0, 1, 2 for the partial
ordering (but for -ffast-math 2 is never the case).
I'm afraid functions like that are fairly common and unless they are
inlined, we really need to map the comparison to those -1, 0, 1 or
-1, 0, 1, 2 values.

Now, for floating point spaceship I've in the past already added an
optimization (with tree-ssa-math-opts.cc discovery and named optab, the
optab only defined on x86 though right now), which ensures there is just
a single comparison instruction and then just tests based on flags.
Now, if we have code like:
  auto a = x <=> y;
  if (a == std::partial_ordering::less)
bar ();
  else if (a == std::partial_ordering::greater)
baz ();
  else if (a == std::partial_ordering::equivalent)
qux ();
  else if (a == std::partial_ordering::unordered)
corge ();
etc., that results in decent code generation, the spaceship named pattern
on x86 optimizes for the jumps, so emits comparisons on the flags, followed
by setting the result to -1, 0, 1, 2 and subsequent jump pass optimizes that
well.  But if the result needs to be stored into an integer and just
returned that way or there are no immediate jumps based on it (or turned
into some non-standard integer values like -42, 0, 36, 75 etc.), then CE
doesn't do a good job for that, we end up with say
comiss  %xmm1, %xmm0
jp  .L4
seta%al
movl$0, %edx
leal-1(%rax,%rax), %eax
cmove   %edx, %eax
ret
.L4:
movl$2, %eax
ret
The jp is good, that is the unlikely case and can't be easily handled in
straight line code due to the layout of the flags, but the rest uses cmov
which often isn't a win and a weird math.
With the patch below we can get instead
xorl%eax, %eax
comiss  %xmm1, %xmm0
jp  .L2
seta%al
sbbl$0, %eax
ret
.L2:
movl$2, %eax
ret

The patch changes the discovery in the generic code, by detecting if
the future .SPACESHIP result is just used in a PHI with -1, 0, 1 or
-1, 0, 1, 2 values (the latter for HONOR_NANS) and passes that as a flag in
a new argument to .SPACESHIP ifn, so that the named pattern is told whether
it should optimize for branches or for loading the result into a -1, 0, 1
(, 2) integer.  Additionally, it doesn't detect just floating point <=>
anymore, but also integer and unsigned integer, but in those cases only
if an integer -1, 0, 1 is wanted (otherwise == and > or similar comparisons
result in good code).
The backend then can for those integer or unsigned integer <=>s return
effectively (x > y) - (x < y) in a way that is efficient on the target
(so for x86 with ensuring zero initialization first when needed before
setcc; one for floating point and unsigned, where there is just one setcc
and the second one optimized into sbb instruction, two for the signed int
case).  So e.g. for signed int we now emit
xorl%edx, %edx
xorl%eax, %eax
cmpl%esi, %edi
setl%dl
setg%al
subl%edx, %eax
ret
and for unsigned
xorl%eax, %eax
cmpl%esi, %edi
seta%al
sbbb$0, %al
ret

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Note, I wonder if other targets wouldn't benefit from defining the
named optab too...

2024-10-04  Jakub Jelinek  

PR middle-end/116896
* optabs.def (spaceship_optab): Use spaceship$a4 rather than
spaceship$a3.
* internal-fn.cc (expand_SPACESHIP): Expect 3 call arguments
rather than 2, expand the last one, expect 4 operands of
spaceship_optab.
* tree-ssa-math-opts.cc: Include cfghooks.h.
(optimize_spaceship): Check if a single PHI is initialized to
-1, 0, 1, 2 or -1, 0, 1 values, in that case pass 1 as last (new)
argument to .SPACESHIP and optimize away the comparisons,
otherwise pass 0.  Also check for integer comparisons rather than
floating point, in that case do it only if there is a single PHI
with -1, 0, 1 values and pass 1 to last argument of .SPACESHIP
if the <=> is signed, 2 if unsigned.
* config/i386/i386-protos.h (ix86_expand_fp_spaceship): Add
another rtx argument.
(ix86_expand_int_spaceship): Declare.
* config/i386/i386-expand.cc (ix86_expand_fp_spaceship): Add
arg3 argument, if it

1 2 >

1 - 100 of 121 matches

Mail list logo