Re: [PATCH 2/3] aarch64: libgcc: add prototypes in cpuinfo
On Fri, 4 Oct 2024 at 10:00, Kyrylo Tkachov wrote: > > > > > On 3 Oct 2024, at 21:44, Christophe Lyon wrote: > > > > External email: Use caution opening links or attachments > > > > > > Add prototypes for __init_cpu_features_resolver and > > __init_cpu_features to avoid warnings due to -Wmissing-prototypes. > > > >libgcc/ > >* config/aarch64/cpuinfo.c (__init_cpu_features_resolver): Add > >prototype. > >(__init_cpu_features): Likewise. > > --- > > libgcc/config/aarch64/cpuinfo.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/libgcc/config/aarch64/cpuinfo.c > > b/libgcc/config/aarch64/cpuinfo.c > > index 4b94fca8695..c62a7453e8e 100644 > > --- a/libgcc/config/aarch64/cpuinfo.c > > +++ b/libgcc/config/aarch64/cpuinfo.c > > @@ -418,6 +418,7 @@ __init_cpu_features_constructor(unsigned long hwcap, > > setCPUFeature(FEAT_INIT); > > } > > > > +void __init_cpu_features_resolver(unsigned long, const __ifunc_arg_t *); > > void > > __init_cpu_features_resolver(unsigned long hwcap, const __ifunc_arg_t *arg) > > { > > if (__aarch64_cpu_features.features) > > @@ -425,6 +426,7 @@ __init_cpu_features_resolver(unsigned long hwcap, const > > __ifunc_arg_t *arg) { > > __init_cpu_features_constructor(hwcap, arg); > > } > > > > +void __init_cpu_features(void); > > void __attribute__ ((constructor)) > > __init_cpu_features(void) { > > unsigned long hwcap; > > I thought the intent of the missing-prototypes warning is to warn about > missing prototypes in a header file primarily. Indeed, that's my understanding too > Should these prototypes go into gcc/common/config/aarch64/cpuinfo.h instead? In that case, compilation of gcc/config/aarch64/aarch64.c fails because: gcc/common/config/aarch64/cpuinfo.h:96:56: error: ‘__ifunc_arg_t’ does not name a type and it does not seem obvious to expose this type in aarch64.c IIUC, these functions never have their prototypes exposed/used, and I'm not even sure how __init_cpu_features is called: in dispatch_function_versions(), I only see a reference to __init_cpu_features_resolver? (But I'm not at all familiar with this code) Thanks, Christophe > Thanks, > Kyrill > > > -- > > 2.34.1 > > >
[Ada] Fix PR ada/116430
This is a regression present on the 14 branch only: the expander gets confused when trying to insert the finalizer of a procedure that contains a package as a subunit. The offending code no longer exists on the mainline so this adds the minimal fix to address the issue. Tested on x86-64/Linux, applied on the 14 branch only. 2024-10-04 Eric Botcazou PR ada/116430 * exp_ch7.adb (Build_Finalizer.Create_Finalizer): For the insertion point of the finalizer, deal with package bodies that are subunits. -- Eric Botcazoudiff --git a/gcc/ada/exp_ch7.adb b/gcc/ada/exp_ch7.adb index e594a534244..123abb63289 100644 --- a/gcc/ada/exp_ch7.adb +++ b/gcc/ada/exp_ch7.adb @@ -2051,6 +2051,12 @@ package body Exp_Ch7 is and then List_Containing (Finalizer_Insert_Nod) = Stmts) then Finalizer_Insert_Nod := Last_Top_Level_Ctrl_Construct; + if Nkind (Finalizer_Insert_Nod) = N_Package_Body +and then Nkind (Parent (Finalizer_Insert_Nod)) = N_Subunit + then + Finalizer_Insert_Nod := +Corresponding_Stub (Parent (Finalizer_Insert_Nod)); + end if; end if; Insert_After (Finalizer_Insert_Nod, Fin_Body);
[PATCH] expr, v2: Don't clear whole unions [PR116416]
On Thu, Oct 03, 2024 at 12:14:35PM -0400, Jason Merrill wrote: > Agreed, the padding bits have indeterminate values (or erroneous in C++26), > so it's correct for infoleak-1.c to complain about 4b. I've been afraid what the kernel people would say about this change (because reading Linus' mails shows he doesn't care about what the standards say, but what he expects to see, anything else is "broken"). > > Though, looking at godbolt, clang and icc 19 and older gcc all do zero > > initialize the whole union before storing the single member in there (if > > non-zero, otherwise just clear). > > > > So whether we want to do this or do it by default is another question. > > We will want to initialize the padding (for all types) to something for > C++26, but that's a separate issue... But ideally in a way where uninit warnings know the bits aren't initialized even if they are. > > Anyway, bootstrapped/regtested on x86_64-linux and i686-linux successfully. > > > > 2024-09-28 Jakub Jelinek > > > > PR c++/116416 > > * expr.cc (categorize_ctor_elements_1): Fix up union handling of > > *p_complete. Clear it only if num_fields is 0 and the union has > > at least one FIELD_DECL, set to -1 if either union has no fields > > and non-zero size, or num_fields is 1 and complete_ctor_at_level_p > > returned false. > > Hmm, complete_ctor_at_level_p also seems to need a change for this > understanding of union semantics: "every meaningful byte" depends on the > active member, so it seems like it should return true for a union iff > num_elts == 1. I thought complete_ctor_at_level_p has a single caller, but apparently that isn't the case, cp/typeck2.cc uses it too. Here is an updated version of the patch, which a) moves some of the stuff into complete_ctor_at_level_p (but not all the *p_complete = 0; case, for that it would need to change so that it passes around the ctor rather than just its type) b) introduces a new option, so that users can either get the new behavior (only what is guaranteed by the standards, the default), or previous behavior (union padding zero initialization, no such guarantees in structures) or also a guarantee in structures c) introduces a new CONSTRUCTOR flag which says that the padding bits (if any) should be zero initialized (and sets it for now in the C++ FE for C23 {} initializers). Am not sure the CONSTRUCTOR_ZERO_PADDING_BITS flag is really needed for C23, if there is just empty initializer, I think we already mark it as incomplete if there are any missing initializers. Maybe with some designated initializer games, say void foo () { struct S { char a; long long b; }; struct T { struct S c; } t = { .c = {}, .c.a = 1, .c.b = 2 }; ... } Is this supposed to initialize padding bits in C23 and then the .c.a = 1 and .c.b = 2 stores preserve those padding bits, so is that supposed to be different from struct T t2 = { .c = { 1, 2 } }; ? What about just struct T t3 = { .c.a = 1, .c.b = 2 }; ? And I haven't touched the C++ FE for the flag, because I'm afraid I'm lost on where exactly is zero-initialization done (vs. other types of initialization) and where is e.g. zero-initialization of a temporary then (member-wise) copied. Say struct S { char a; long long b; }; struct T { constexpr T (int a, int b) : c () { c.a = a; c.b = b; } S c; }; void bar (T *); void foo () { T t (1, 2); bar (&t); } Is the c () value-initialization of t.c followed by c.a and c.b updates which preserve the zero initialized padding bits? Or is there some copy construction involved which does member-wise copying and makes the padding bits undefined? Looking at (older) clang++ with -O2, it initializes also the padding bits when c () is used and doesn't with c {}. For GCC, note that there is that optimization from Alex to zero padding bits for optimization purposes for small aggregates, so either one needs to look at -O0 -fdump-tree-gimple dumps, or use larger structures which aren't optimized that way. Only lightly tested so far, this is mostly for further discussions. And also a question what exactly does cp/typeck2.cc want from complete_ctor_at_level_p, e.g. if it wants false for all the cases where categorize_ctor_elements_1 does *p_complete = 0; (in that case it would need to know whether CONSTRUCTOR_ZERO_PADDING_BITS flag was set). 2024-10-04 Jakub Jelinek PR c++/116416 gcc/ * flag-types.h (enum zero_init_padding_bits_kind): New type. * tree.h (CONSTRUCTOR_ZERO_PADDING_BITS): Define. * common.opt (fzero-init-padding-bits=): New option. * expr.cc (categorize_ctor_elements_1): Handle CONSTRUCTOR_ZERO_PADDING_BITS or flag_zero_init_padding_bits == ZERO_INIT_PADDING_BITS_ALL. Fix up *p_complete = -1; setting for unions. (complete_ctor_at_level_p): Handle unions differently for flag_zero_init_padding_bits == ZERO_INIT_PADDING_BITS_STANDARD. * gimple-fold.cc (type_has_padding
Re: [RFC PATCH] ARM: thumb1: fix bad code emitted when HI_REGS involved
пт, 4 окт. 2024 г. в 19:07, Christophe Lyon : > > On Fri, 4 Oct 2024 at 16:59, Siarhei Volkau wrote: > > > > Hello, > > > > пт, 4 окт. 2024 г. в 16:48, Christophe Lyon : > > > > > > Hi! > > > > > > > > > On Mon, 8 Jul 2024 at 10:57, Siarhei Volkau wrote: > > > > > > > > ping > > > > > > > > чт, 20 июн. 2024 г. в 12:09, Siarhei Volkau : > > > > > > > > > > This patch deals with consequences but not the root cause though. > > > > > > > > > > There are 5 cases which are subjects to rewrite: > > > > > case #1: > > > > > mov ip, r1 > > > > > add r2, ip > > > > > # ip is dead here > > > > > can be rewritten as: > > > > > adds r2, r1 > > > > > > Why replace 'add' with 'adds' ? > > > > > > Thanks, > > > > > > Christophe > > > > > > > Good catch, actually. Silly answer is: > > because there's no alternative without {S} for Lo registers in thumb1. > > > > Correct me if I'm wrong, I don't think that we have to do something > > special with CC reg there because conditional execution instructions > > (thumb1_cbz, cbranchsi4_insn) take care of that. > > See thumb1_final_prescan_insn. > > > > Not familiar with how this is handled, but my question is more like: > if the original code is > case #1: > adds r3,r0 ;; or any instruction which sets CC > mov ip, r1 > add r2, ip > # ip is dead here > cbz ... > > If you rewrite as > adds r3,r0 > adds r2, r1 > cbz > then you change CC and it does not get the value expected by cbz. > > Am I missing something? > > Thanks, > > Christophe > Your point is correct in general but look at the thumb1.md. You will not find a separate "compare" pattern (except one) and "if_then_else", which relies on the previous compare result. Because they are combined in one insn pattern, they will be emitted together as a pair of cmp/cbranch, later than peephole2. So there's no chance to put an instruction in between by this patch. But even if it happens somehow, (as I said there's one "compare" insn) there's a mechanism which tracks condition codes for branch insn. And if the CC is not matched for the branch insn then extra cmp will be emitted. > > > Thanks > > > > Siarhei > > > > > > > > > > > > case #2: > > > > > add ip, r1 > > > > > mov r1, ip > > > > > # ip is dead here > > > > > can be rewritten as: > > > > > add r1, ip > > > > > > > > > > case #3: > > > > > mov ip, r1 > > > > > add r2, ip > > > > > add r3, ip > > > > > # ip is dead here > > > > > can be rewritten as: > > > > > adds r2, r1 > > > > > adds r3, r1 > > > > > > > > > > case #4: > > > > > mov ip, r1 > > > > > add ip, r2 > > > > > mov r1, ip > > > > > can be rewritten as: > > > > > adds r1, r2 > > > > > mov ip, r1 <- might be eliminated too, if ip is dead > > > > > > > > > > case #5 (arbitrary): > > > > > mov r1, ip > > > > > subs r2, r1, r2 > > > > > mov ip, r2 > > > > > # r1 is dead here > > > > > can be rewritten as: > > > > > rsbs r1, r2, #0 > > > > > add ip, r1 > > > > > movs r2, ip <- might be eliminated, if r2 is dead > > > > > > > > > > Speed profit wasn't checked but size changes are the following: > > > > >libgcc: -132 bytes / -0.25% > > > > > libc: -1262 bytes / -0.55% > > > > > libm: -384 bytes / -0.42% > > > > > libstdc++: -2258 bytes / -0.30% > > > > > > > > > > No tests provided because its hard to force GCC to emit HI_REGS > > > > > in a small and straightforward function. > > > > > > > > > > Signed-off-by: Siarhei Volkau > > > > > --- > > > > > gcc/config/arm/thumb1.md | 93 > > > > > +++- > > > > > 1 file changed, 92 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md > > > > > index d7074b43f60..9da4af9eccd 100644 > > > > > --- a/gcc/config/arm/thumb1.md > > > > > +++ b/gcc/config/arm/thumb1.md > > > > > @@ -2055,4 +2055,95 @@ (define_insn "thumb1_stack_protect_test_insn" > > > > > (set_attr "conds" "clob") > > > > > (set_attr "type" "multiple")] > > > > > ) > > > > > - > > > > > + > > > > > +;; bad code emitted when HI_REGS involved in addition > > > > > +;; subtract also might happen rarely > > > > > + > > > > > +;; case #1: > > > > > +;; mov ip, r1 > > > > > +;; add r2, ip # ip is dead after that > > > > > +(define_peephole2 > > > > > + [(set (match_operand:SI 0 "register_operand" "") > > > > > + (match_operand:SI 1 "register_operand" "")) > > > > > + (set (match_operand:SI 2 "register_operand" "") > > > > > + (plus:SI (match_dup 2) (match_dup 0)))] > > > > > + "TARGET_THUMB1 > > > > > +&& peep2_reg_dead_p (2, operands[0]) > > > > > +&& REGNO_REG_CLASS (REGNO (operands[0])) == HI_REGS" > > > > > + [(set (match_dup 2) > > > > > + (plus:SI (match_dup 2) (match_dup 1)))] > > > > > + "") > > > > > + > > > > > +;; case #2: > > > > > +;; add ip, r1 > > > > > +;; mov r1, ip # ip is dead after that > > > > > +(define_peephole2 > > > > > + [(set (match_operand:SI 0 "register_operand" "") > > > >
[PATCH v3 3/5] openmp: Add support for iterators in 'target update' clauses (C/C++)
This patch extends the previous patch to cover to/from clauses in 'target update'.From 1c8bf84ec99fe2fd371e345f012eb0d84a923153 Mon Sep 17 00:00:00 2001 From: Kwok Cheung Yeung Date: Fri, 4 Oct 2024 15:16:21 +0100 Subject: [PATCH 3/5] openmp: Add support for iterators in 'target update' clauses (C/C++) This adds support for iterators in 'to' and 'from' clauses in the 'target update' OpenMP directive. 2024-10-04 Kwok Cheung Yeung gcc/c/ * c-parser.cc (c_parser_omp_clause_from_to): Parse 'iterator' modifier. * c-typeck.cc (c_finish_omp_clauses): Finish iterators for to/from clauses. gcc/cp/ * parser.cc (cp_parser_omp_clause_from_to): Parse 'iterator' modifier. * semantics.cc (finish_omp_clauses): Finish iterators for to/from clauses. gcc/ * gimplify.cc (gimplify_scan_omp_clauses): Call check_omp_map_iterators on clauses with iterators. Skip gimplification of clause decl and size for clauses with iterators. * omp-low.cc (lower_omp_target): Call lower_omp_map_iterators on to/from clauses. * tree-pretty-print.cc (dump_omp_clause): Call dump_omp_iterators for to/from clauses with iterators. * tree.cc (omp_clause_num_ops): Add extra operand for OMP_CLAUSE_FROM and OMP_CLAUSE_TO. * tree.h (OMP_CLAUSE_HAS_ITERATORS): Add check for OMP_CLAUSE_TO and OMP_CLAUSE_FROM. (OMP_CLAUSE_ITERATORS): Likewise. gcc/testsuite/ * c-c++-common/gomp/target-update-iterators-1.c: New. * c-c++-common/gomp/target-update-iterators-2.c: New. * c-c++-common/gomp/target-update-iterators-3.c: New. libgomp/ * target.c (gomp_update): Call gomp_merge_iterator_maps. Free allocated variables. * testsuite/libgomp.c-c++-common/target-update-iterators-1.c: New. * testsuite/libgomp.c-c++-common/target-update-iterators-2.c: New. * testsuite/libgomp.c-c++-common/target-update-iterators-3.c: New. --- gcc/c/c-parser.cc | 105 +++-- gcc/c/c-typeck.cc | 5 +- gcc/cp/parser.cc | 111 -- gcc/cp/semantics.cc | 5 +- gcc/gimplify.cc | 18 ++- gcc/omp-low.cc| 3 +- .../gomp/target-update-iterators-1.c | 20 .../gomp/target-update-iterators-2.c | 17 +++ .../gomp/target-update-iterators-3.c | 17 +++ gcc/tree-pretty-print.cc | 10 ++ gcc/tree.cc | 4 +- gcc/tree.h| 8 +- libgomp/target.c | 14 +++ .../target-update-iterators-1.c | 65 ++ .../target-update-iterators-2.c | 58 + .../target-update-iterators-3.c | 67 +++ 16 files changed, 496 insertions(+), 31 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/gomp/target-update-iterators-1.c create mode 100644 gcc/testsuite/c-c++-common/gomp/target-update-iterators-2.c create mode 100644 gcc/testsuite/c-c++-common/gomp/target-update-iterators-3.c create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-update-iterators-1.c create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-update-iterators-2.c create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-update-iterators-3.c diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index 184fc076388..c2a5985c89b 100644 --- a/gcc/c/c-parser.cc +++ b/gcc/c/c-parser.cc @@ -19304,8 +19304,11 @@ c_parser_omp_clause_device_type (c_parser *parser, tree list) to ( variable-list ) OpenMP 5.1: - from ( [present :] variable-list ) - to ( [present :] variable-list ) */ + from ( [motion-modifier[,] [motion-modifier[,]...]:] variable-list ) + to ( [motion-modifier[,] [motion-modifier[,]...]:] variable-list ) + + motion-modifier: + present | iterator (iterators-definition) */ static tree c_parser_omp_clause_from_to (c_parser *parser, enum omp_clause_code kind, @@ -19316,15 +19319,88 @@ c_parser_omp_clause_from_to (c_parser *parser, enum omp_clause_code kind, if (!parens.require_open (parser)) return list; + int pos = 1, colon_pos = 0; + int iterator_length = 0; + while (c_parser_peek_nth_token_raw (parser, pos)->type == CPP_NAME) +{ + if (c_parser_peek_nth_token_raw (parser, pos + 1)->type + == CPP_OPEN_PAREN) + { + unsigned int n = pos + 2; + if (c_parser_check_balanced_raw_token_sequence (parser, &n) +&& (c_parser_peek_nth_token_raw (parser, n)->type +== CPP_CLOSE_PAREN)) + { + iterator_length = n - pos + 1; + pos = n; + } + } + if (c_parser_peek_nth_token_raw (parser, pos + 1)->type == CP
arm: Make arm_noce_conversion_profitable_p call default hook [PR 116444]
Hi, The patch for 'arm: Fix missed CE optimization for armv8.1-m.main [PR 116444]' introduced regressions with arm targets that used 'noce' before. This is because it would approve all noce optimisations without using the default cost check. Not sure why this didn't show up in my original testing, I suspect you need to test this for a set of specific targets like Torbjorn did, thank you for pointing these issues out to me. Could I ask you to rerun them with this patch? I'll try to do that locally too. Happy to receive reviews, but I'm waiting for Torbjorn and my own testing to complete before committing. When not dealing with the special armv8.1-m.main conditional instructions case make sure it uses the default_noce_conversion_profitable_p call to determine whether the sequence is cost effective. gcc/ChangeLog: PR target/116444 * config/arm/arm.cc (arm_noce_conversion_profitable_p): Call default_noce_conversion_profitable_p when not dealing with the armv8.1-m.main conditional instructions special cases.diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc index 077c80df4482d168d9694795be68c2eeb8f304d9..fd437f428781673e1d44498d31a47f174e0f57fa 100644 --- a/gcc/config/arm/arm.cc +++ b/gcc/config/arm/arm.cc @@ -36168,7 +36168,7 @@ arm_noce_conversion_profitable_p (rtx_insn *seq, struct noce_if_info *if_info) { if (!TARGET_COND_ARITH || reload_completed) -return true; +return default_noce_conversion_profitable_p (seq, if_info); if (arm_is_v81m_cond_insn (seq)) return true;
Re: [RFC PATCH] ARM: thumb1: fix bad code emitted when HI_REGS involved
On Fri, 4 Oct 2024 at 16:59, Siarhei Volkau wrote: > > Hello, > > пт, 4 окт. 2024 г. в 16:48, Christophe Lyon : > > > > Hi! > > > > > > On Mon, 8 Jul 2024 at 10:57, Siarhei Volkau wrote: > > > > > > ping > > > > > > чт, 20 июн. 2024 г. в 12:09, Siarhei Volkau : > > > > > > > > This patch deals with consequences but not the root cause though. > > > > > > > > There are 5 cases which are subjects to rewrite: > > > > case #1: > > > > mov ip, r1 > > > > add r2, ip > > > > # ip is dead here > > > > can be rewritten as: > > > > adds r2, r1 > > > > Why replace 'add' with 'adds' ? > > > > Thanks, > > > > Christophe > > > > Good catch, actually. Silly answer is: > because there's no alternative without {S} for Lo registers in thumb1. > > Correct me if I'm wrong, I don't think that we have to do something > special with CC reg there because conditional execution instructions > (thumb1_cbz, cbranchsi4_insn) take care of that. > See thumb1_final_prescan_insn. > Not familiar with how this is handled, but my question is more like: if the original code is case #1: adds r3,r0 ;; or any instruction which sets CC mov ip, r1 add r2, ip # ip is dead here cbz ... If you rewrite as adds r3,r0 adds r2, r1 cbz then you change CC and it does not get the value expected by cbz. Am I missing something? Thanks, Christophe > Thanks > > Siarhei > > > > > > > > > case #2: > > > > add ip, r1 > > > > mov r1, ip > > > > # ip is dead here > > > > can be rewritten as: > > > > add r1, ip > > > > > > > > case #3: > > > > mov ip, r1 > > > > add r2, ip > > > > add r3, ip > > > > # ip is dead here > > > > can be rewritten as: > > > > adds r2, r1 > > > > adds r3, r1 > > > > > > > > case #4: > > > > mov ip, r1 > > > > add ip, r2 > > > > mov r1, ip > > > > can be rewritten as: > > > > adds r1, r2 > > > > mov ip, r1 <- might be eliminated too, if ip is dead > > > > > > > > case #5 (arbitrary): > > > > mov r1, ip > > > > subs r2, r1, r2 > > > > mov ip, r2 > > > > # r1 is dead here > > > > can be rewritten as: > > > > rsbs r1, r2, #0 > > > > add ip, r1 > > > > movs r2, ip <- might be eliminated, if r2 is dead > > > > > > > > Speed profit wasn't checked but size changes are the following: > > > >libgcc: -132 bytes / -0.25% > > > > libc: -1262 bytes / -0.55% > > > > libm: -384 bytes / -0.42% > > > > libstdc++: -2258 bytes / -0.30% > > > > > > > > No tests provided because its hard to force GCC to emit HI_REGS > > > > in a small and straightforward function. > > > > > > > > Signed-off-by: Siarhei Volkau > > > > --- > > > > gcc/config/arm/thumb1.md | 93 +++- > > > > 1 file changed, 92 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md > > > > index d7074b43f60..9da4af9eccd 100644 > > > > --- a/gcc/config/arm/thumb1.md > > > > +++ b/gcc/config/arm/thumb1.md > > > > @@ -2055,4 +2055,95 @@ (define_insn "thumb1_stack_protect_test_insn" > > > > (set_attr "conds" "clob") > > > > (set_attr "type" "multiple")] > > > > ) > > > > - > > > > + > > > > +;; bad code emitted when HI_REGS involved in addition > > > > +;; subtract also might happen rarely > > > > + > > > > +;; case #1: > > > > +;; mov ip, r1 > > > > +;; add r2, ip # ip is dead after that > > > > +(define_peephole2 > > > > + [(set (match_operand:SI 0 "register_operand" "") > > > > + (match_operand:SI 1 "register_operand" "")) > > > > + (set (match_operand:SI 2 "register_operand" "") > > > > + (plus:SI (match_dup 2) (match_dup 0)))] > > > > + "TARGET_THUMB1 > > > > +&& peep2_reg_dead_p (2, operands[0]) > > > > +&& REGNO_REG_CLASS (REGNO (operands[0])) == HI_REGS" > > > > + [(set (match_dup 2) > > > > + (plus:SI (match_dup 2) (match_dup 1)))] > > > > + "") > > > > + > > > > +;; case #2: > > > > +;; add ip, r1 > > > > +;; mov r1, ip # ip is dead after that > > > > +(define_peephole2 > > > > + [(set (match_operand:SI 0 "register_operand" "") > > > > + (plus:SI (match_dup 0) (match_operand:SI 1 "register_operand" > > > > ""))) > > > > + (set (match_dup 1) (match_dup 0))] > > > > + "TARGET_THUMB1 > > > > +&& peep2_reg_dead_p (2, operands[0]) > > > > +&& REGNO_REG_CLASS (REGNO (operands[0])) == HI_REGS" > > > > + [(set (match_dup 1) > > > > + (plus:SI (match_dup 1) (match_dup 0)))] > > > > + "") > > > > + > > > > +;; case #3: > > > > +;; mov ip, r1 > > > > +;; add r2, ip > > > > +;; add r3, ip # ip is dead after that > > > > +(define_peephole2 > > > > + [(set (match_operand:SI 0 "register_operand" "") > > > > + (match_operand:SI 1 "register_operand" "")) > > > > + (set (match_operand:SI 2 "register_operand" "") > > > > + (plus:SI (match_dup 2) (match_dup 0))) > > > > + (set (match_operand:SI 3 "register_operand" "") > > > > + (plus:SI (match_dup 3) (match_dup 0)))] > > > > + "TARGET_THUMB1 > > > > +&& peep2_reg_dead_p (3, operands[0
Re: [PATCH] aarch64: Fix bug with max/min (PR116934)
writes: > In ac4cdf5cb43c0b09e81760e2a1902ceebcf1a135, I introduced a bug where > I put the new unspecs, UNSPEC_COND_SMAX and UNSPEC_COND_SMIN, into the > wrong iterator. > > I should have put new unspecs in SVE_COND_FP_MAXMIN but I put it in > SVE_COND_FP_BINARY_REG instead. That was incorrect because the > SVE_COND_FP_MAXMIN iterator is being used for predicated floating-point > maximum/minimum, not SVE_COND_FP_BINARY_REG. > > Also added a testcase to validate the new change. > > Regression tested on aarch64-unknown-linux-gnu and found no regressions. > There are some test cases with "libitm" in their directory names which > appear in compare_tests output as changed tests but it looks like they > are in the output just because of changed build directories, like from > build-patched/aarch64-unknown-linux-gnu/./libitm/* to > build-pristine/aarch64-unknown-linux-gnu/./libitm/*. I didn't think it > was a cause of concern and have pushed this for review. > > gcc/ChangeLog: > > * config/aarch64/iterators.md: Move UNSPEC_COND_SMAX and > UNSPEC_COND_SMIN to correct iterators. > > gcc/testsuite/ChangeLog: > > PR target/116934 > * gcc.target/aarch64/sve2/pr116934.c: New test. OK, thanks. I see the only effect of the patch is (rightly) to add back the constant zero alternatives. Richard > --- > gcc/config/aarch64/iterators.md | 8 > gcc/testsuite/gcc.target/aarch64/sve2/pr116934.c | 13 + > 2 files changed, 17 insertions(+), 4 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pr116934.c > > diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md > index 0836dee61c9..fcad236eee9 100644 > --- a/gcc/config/aarch64/iterators.md > +++ b/gcc/config/aarch64/iterators.md > @@ -3125,9 +3125,7 @@ > > (define_int_iterator SVE_COND_FP_BINARY_REG >[UNSPEC_COND_FDIV > - UNSPEC_COND_FMULX > - UNSPEC_COND_SMAX > - UNSPEC_COND_SMIN]) > + UNSPEC_COND_FMULX]) > > (define_int_iterator SVE_COND_FCADD [UNSPEC_COND_FCADD90 >UNSPEC_COND_FCADD270]) > @@ -3135,7 +3133,9 @@ > (define_int_iterator SVE_COND_FP_MAXMIN [UNSPEC_COND_FMAX >UNSPEC_COND_FMAXNM >UNSPEC_COND_FMIN > - UNSPEC_COND_FMINNM]) > + UNSPEC_COND_FMINNM > + UNSPEC_COND_SMAX > + UNSPEC_COND_SMIN]) > > (define_int_iterator SVE_COND_FP_TERNARY [UNSPEC_COND_FMLA > UNSPEC_COND_FMLS > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pr116934.c > b/gcc/testsuite/gcc.target/aarch64/sve2/pr116934.c > new file mode 100644 > index 000..94fb96ffa7d > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pr116934.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-Ofast -mcpu=neoverse-v2" } */ > + > +int a; > +float *b; > + > +void foo() { > + for (; a; a--, b += 4) { > +b[0] = b[1] = b[2] = b[2] > 0 ?: 0; > +if (b[3] < 0) > + b[3] = 0; > + } > +}
Re: [PATCH v1] Add -ftime-report-wall
On Thu, 2024-10-03 at 11:15 -0700, Andi Kleen wrote: > > The only consumer I know of for the JSON time report data is in the > > integration tests I wrote for -fanalyzer, which assumes that all > > fields > > are present when printing, and then goes on to use the "user" times > > for > > summarizing; see this commit FWIW: > > https://github.com/davidmalcolm/gcc-analyzer-integration-tests/commit/5420ce968e6eae886e61486555b54fd460e0d35f > > It seems to be broken even without my changes: > > > % ./gcc/cc1plus -ftime-report -fdiagnostics-format=sarif-file > ../tsrc/tramp3d-v4.i > cc1plus: internal compiler error: Segmentation fault Oops, thanks; I'm tracking this as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116978 and working on a fix. Dave
Re: [PATCH] x86: Disable stack protector for naked functions
On Fri, Oct 4, 2024 at 2:11 PM H.J. Lu wrote: > > Since naked functions should not enable stack protector, define > TARGET_STACK_PROTECT_RUNTIME_ENABLED_P to disable stack protector > for naked functions. > > gcc/ > > PR target/116962 > * config/i386/i386.cc (ix86_stack_protect_runtime_enabled_p): New > function. > (TARGET_STACK_PROTECT_RUNTIME_ENABLED_P): New. > > gcc/testsuite/ > > PR target/116962 > * gcc.target/i386/pr116962.c: New file. > > OK for master? OK, also for backports. Thanks, Uros.
Re: [to-be-committed][RISC-V] Add splitters to restore condops generation after recent phiopt changes
On Fri, 4 Oct 2024, Jeff Law wrote: > > More importantly may I ask you to review the second paragraph of commit > > 6c3365e715fa ("RISC-V: Also handle sign extension in branch costing") to > > see if any of the other issues referred there have also been now sorted > > and mention that in the change description, possibly with a commit hash > > reference to Andrew P's recent improvements? And in particular can the > > branch costs requested be lowered for gcc.target/riscv/cset-sext.c now? > So with Andrew's changes those tests are no longer sensitive to branch cost at > all AFAICT. I suspect we could just remove the explicit branch cost > directives completely from the C tests. They'd still be needed for the RTL > tests since those are unaffected by Andrew's changes. Thoughts? I expected this to be the case given the nature of Andrew's changes. So my suggestion is to set `-mbranch-cost=1' with the C tests instead, so as to have the lack of sensitivity to branch costing covered now. Maciej
Re: [to-be-committed][RISC-V] Add splitters to restore condops generation after recent phiopt changes
On 10/3/24 5:40 PM, Maciej W. Rozycki wrote: More importantly may I ask you to review the second paragraph of commit 6c3365e715fa ("RISC-V: Also handle sign extension in branch costing") to see if any of the other issues referred there have also been now sorted and mention that in the change description, possibly with a commit hash reference to Andrew P's recent improvements? And in particular can the branch costs requested be lowered for gcc.target/riscv/cset-sext.c now? So with Andrew's changes those tests are no longer sensitive to branch cost at all AFAICT. I suspect we could just remove the explicit branch cost directives completely from the C tests. They'd still be needed for the RTL tests since those are unaffected by Andrew's changes. Thoughts? jeff
[PATCH v3 4/5] openmp, fortran: Add support for map iterators in OpenMP target construct (Fortran)
This patch adds support for iterators in the map clause of OpenMP target constructs. The parsing and translation of iterators in the front-end works the same way as for the affinity and depend clauses, except for putting the iterator into the OMP_CLAUSE_ITERATOR of the clause. The iterator gimplification needed to be modified slightly to handle Fortran. The difference in how ranges work in loops (i.e. the condition on the upper bound is <=, rather than < as in C/C++) needs to be compensated for when calculating the iteration count and in the iteration loop itself. During Fortran translation of iterators, statements for the side-effects of any translated expressions are placed into BLOCK_SUBBLOCKS of the block containing the iterator variables (this also occurs with the other clauses supporting iterators). However, the previous lowering of iterators into Gimple does not appear to do anything with these statements, which causes issues if anything in the loop body references these side-effects (typically calculation of array boundaries and strides). This appears to be a bug that was simply not triggered by existing testcases. These statements are now gimplified into the innermost loop body. The libgomp runtime was modified to handle GOMP_MAP_STRUCTs in iterators, which can result from the use of derived types (which I used in test cases to implement arrays of pointers). libgomp expects a GOMP_MAP_STRUCT map to be followed immediately by a number of maps corresponding to the fields of the struct, so an iterator GOMP_MAP_STRUCT and its fields need to be expanded in a breadth-first order, rather than the usual depth-first manner (which would result in multiple GOMP_MAP_STRUCTS, followed by multiple instances of the first field, then multiples of the second etc.). The presence of variables in the field offset triggers the unwanted creation of GOMP_MAP_STRUCT_UNORD for variable offsets. The offset tree is now walked over and if it only contains iterator variables, then the offset is treated as constant again (which it is, within the context of each iteration of the iterator).From a24aa032c2e23577d4fbc61df6da79345bae8292 Mon Sep 17 00:00:00 2001 From: Kwok Cheung Yeung Date: Fri, 4 Oct 2024 15:16:29 +0100 Subject: [PATCH 4/5] openmp, fortran: Add support for map iterators in OpenMP target construct (Fortran) This adds support for iterators in map clauses within OpenMP 'target' constructs in Fortran. Some special handling for struct field maps has been added to libgomp in order to handle arrays of derived types. 2024-10-04 Kwok Cheung Yeung gcc/fortran/ * dump-parse-tree.cc (show_omp_namelist): Add iterator support for OMP_LIST_MAP. * openmp.cc (gfc_free_omp_clauses): Free namespace in namelist for OMP_LIST_MAP. (gfc_match_omp_clauses): Parse 'iterator' modifier for 'map' clause. (resolve_omp_clauses): Resolve iterators for OMP_LIST_MAP. * trans-openmp.cc (gfc_trans_omp_clauses): Handle iterators in OMP_LIST_MAP clauses. Add expressions to iter_block rather than block. gcc/ * gimplify.cc (compute_iterator_count): Account for difference in loop boundaries in Fortran. (build_iterator_loop): Change upper boundary condition for Fortran. Insert block statements into innermost loop. (contains_only_iterator_vars_1): New. (contains_only_iterator_vars): New. (extract_base_bit_offset): Add iterator argument. Do not set variable_offset if contains_only_iterator_vars is true. (omp_accumulate_sibling_list): Add iterator argument to extract_base_bit_offset. * omp-low.cc (lower_omp_target): Add sorry if iterators used with deep mapping. * tree-pretty-print.cc (dump_block_node): Ignore BLOCK_SUBBLOCKS containing iterator block statements. gcc/testsuite/ * gfortran.dg/gomp/target-map-iterators-1.f90: New. * gfortran.dg/gomp/target-map-iterators-2.f90: New. * gfortran.dg/gomp/target-map-iterators-3.f90: New. libgomp/ * target.c (kind_to_name): Handle GOMP_MAP_STRUCT and GOMP_MAP_STRUCT_UNORD. (gomp_add_map): New. (gomp_merge_iterator_maps): Expand fields of a struct mapping breadth-first. * testsuite/libgomp.fortran/target-map-iterators-1.f90: New. * testsuite/libgomp.fortran/target-map-iterators-2.f90: New. * testsuite/libgomp.fortran/target-map-iterators-3.f90: New. --- gcc/fortran/dump-parse-tree.cc| 9 +- gcc/fortran/openmp.cc | 35 ++-- gcc/fortran/trans-openmp.cc | 71 gcc/gimplify.cc | 76 ++--- gcc/omp-low.cc| 5 ++ .../gomp/target-map-iterators-1.f90 | 26 ++ .../gomp/target-map-iterators-2.f90 | 27
[PATCH v3 5/5] openmp, fortran: Add support for iterators in OpenMP 'target update' constructs (Fortran)
This patch adds parsing and translation of the 'to' and 'from' clauses for the 'target update' construct in Fortran.From da8ab0cb38d2bc347cf902ec417b0397c28e24e2 Mon Sep 17 00:00:00 2001 From: Kwok Cheung Yeung Date: Fri, 4 Oct 2024 15:16:38 +0100 Subject: [PATCH 5/5] openmp, fortran: Add support for iterators in OpenMP 'target update' constructs (Fortran) This adds Fortran support for iterators in 'to' and 'from' clauses in the 'target update' OpenMP directive. 2024-10-04 Kwok Cheung Yeung gcc/fortran/ * dump-parse-tree.cc (show_omp_namelist): Add iterator support for OMP_LIST_TO and OMP_LIST_FROM. * openmp.cc (gfc_free_omp_clauses): Free namespace for OMP_LIST_TO and OMP_LIST_FROM. (gfc_match_motion_var_list): Parse 'iterator' modifier. (resolve_omp_clauses): Resolve iterators for OMP_LIST_TO and OMP_LIST_FROM. * trans-openmp.cc (gfc_trans_omp_clauses): Handle iterators in OMP_LIST_TO and OMP_LIST_FROM clauses. Add expressions to iter_block rather than block. gcc/testsuite/ * gfortran.dg/gomp/target-update-iterators-1.f90: New. * gfortran.dg/gomp/target-update-iterators-2.f90: New. * gfortran.dg/gomp/target-update-iterators-3.f90: New. libgomp/ * testsuite/libgomp.fortran/target-update-iterators-1.f90: New. * testsuite/libgomp.fortran/target-update-iterators-2.f90: New. * testsuite/libgomp.fortran/target-update-iterators-3.f90: New. --- gcc/fortran/dump-parse-tree.cc| 7 +- gcc/fortran/openmp.cc | 62 +-- gcc/fortran/trans-openmp.cc | 50 ++-- .../gomp/target-update-iterators-1.f90| 25 ++ .../gomp/target-update-iterators-2.f90| 22 ++ .../gomp/target-update-iterators-3.f90| 23 ++ .../target-update-iterators-1.f90 | 68 .../target-update-iterators-2.f90 | 63 +++ .../target-update-iterators-3.f90 | 78 +++ 9 files changed, 386 insertions(+), 12 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/gomp/target-update-iterators-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/target-update-iterators-2.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/target-update-iterators-3.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/target-update-iterators-1.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/target-update-iterators-2.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/target-update-iterators-3.f90 diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc index 3ee6ed1ea7f..0a2d546d3fe 100644 --- a/gcc/fortran/dump-parse-tree.cc +++ b/gcc/fortran/dump-parse-tree.cc @@ -1360,7 +1360,8 @@ show_omp_namelist (int list_type, gfc_omp_namelist *n) { gfc_current_ns = ns_curr; if (list_type == OMP_LIST_AFFINITY || list_type == OMP_LIST_DEPEND - || list_type == OMP_LIST_MAP) + || list_type == OMP_LIST_MAP + || list_type == OMP_LIST_TO || list_type == OMP_LIST_FROM) { gfc_current_ns = n->u2.ns ? n->u2.ns : ns_curr; if (n->u2.ns != ns_iter) @@ -1376,6 +1377,10 @@ show_omp_namelist (int list_type, gfc_omp_namelist *n) fputs ("DEPEND (", dumpfile); else if (list_type == OMP_LIST_MAP) fputs ("MAP (", dumpfile); + else if (list_type == OMP_LIST_TO) + fputs ("TO (", dumpfile); + else if (list_type == OMP_LIST_FROM) + fputs ("FROM (", dumpfile); else gcc_unreachable (); } diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index 3003ba605cf..c765d5814a7 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -194,7 +194,8 @@ gfc_free_omp_clauses (gfc_omp_clauses *c) for (i = 0; i < OMP_LIST_NUM; i++) gfc_free_omp_namelist (c->lists[i], i == OMP_LIST_AFFINITY || i == OMP_LIST_DEPEND - || i == OMP_LIST_MAP, + || i == OMP_LIST_MAP + || i == OMP_LIST_TO || i == OMP_LIST_FROM, i == OMP_LIST_ALLOCATE, i == OMP_LIST_USES_ALLOCATORS, i == OMP_LIST_INIT); @@ -1368,17 +1369,65 @@ gfc_match_motion_var_list (const char *str, gfc_omp_namelist **list, if (m != MATCH_YES) return m; - match m_present = gfc_match (" present : "); + gfc_namespace *ns_iter = NULL, *ns_curr = gfc_current_ns; + int present_modifier = 0, iterator_modifier = 0; + locus present_locus = gfc_current_locus, iterator_locus = gfc_current_locus; - m = gfc_match_omp_variable_list ("", list, false, NULL, h
RE: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE instruction
> -Original Message- > From: Kyrylo Tkachov > Sent: Thursday, October 3, 2024 4:45 PM > To: Richard Sandiford > Cc: Soumya AR ; Tamar Christina > ; gcc-patches@gcc.gnu.org; Richard Earnshaw > ; Jennifer Schmitz ; > Pengxuan Zheng (QUIC) > Subject: Re: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE > instruction > > > > > On 3 Oct 2024, at 16:41, Richard Sandiford > wrote: > > > > External email: Use caution opening links or attachments > > > > > > Soumya AR writes: > >> From 7fafcb5e0174c56205ec05406c9a412196ae93d3 Mon Sep 17 00:00:00 > 2001 > >> From: Soumya AR > >> Date: Thu, 3 Oct 2024 11:53:07 +0530 > >> Subject: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE > >> instruction > >> > >> This patch uses the FSCALE instruction provided by SVE to implement the > >> standard ldexp family of functions. > >> > >> Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the > >> following code: > >> > >> float > >> test_ldexpf (float x, int i) > >> { > >> return __builtin_ldexpf (x, i); > >> } > >> > >> double > >> test_ldexp (double x, int i) > >> { > >> return __builtin_ldexp(x, i); > >> } > >> > >> GCC Output: > >> > >> test_ldexpf: > >> b ldexpf > >> > >> test_ldexp: > >> b ldexp > >> > >> Since SVE has support for an FSCALE instruction, we can use this to process > >> scalar floats by moving them to a vector register and performing an fscale > >> call, > >> similar to how LLVM tackles an ldexp builtin as well. > >> > >> New Output: > >> > >> test_ldexpf: > >> fmov s31, w0 > >> ptrue p7.b, all > >> fscale z0.s, p7/m, z0.s, z31.s > >> ret > >> > >> test_ldexp: > >> sxtw x0, w0 > >> ptrue p7.b, all > >> fmov d31, x0 > >> fscale z0.d, p7/m, z0.d, z31.d > >> ret > >> > >> The patch was bootstrapped and regtested on aarch64-linux-gnu, no > regression. > >> OK for mainline? > > > > Could we also use the .H form for __builtin_ldexpf16? > > > > I suppose: > > > >> @@ -2286,7 +2289,8 @@ > >> (VNx8DI "VNx2BI") (VNx8DF "VNx2BI") > >> (V8QI "VNx8BI") (V16QI "VNx16BI") > >> (V4HI "VNx4BI") (V8HI "VNx8BI") (V2SI "VNx2BI") > >> - (V4SI "VNx4BI") (V2DI "VNx2BI")]) > >> + (V4SI "VNx4BI") (V2DI "VNx2BI") > >> + (SF "VNx4BI") (DF "VNx2BI")]) > > > > ...this again raises the question what we should do for predicate > > modes when the data mode isn't a natural SVE mode. That came up > > recently in relation to V1DI in the popcount patch, and for reductions > > in the ANDV etc. patch. > > Thanks you for enumerating the options below. > > > > > Three obvious options are: > > > > (1) Use the nearest SVE mode with a full ptrue (as the patch does). > > (2) Use the nearest SVE mode with a 128-bit ptrue. > > (3) Add new modes V16BI, V8BI, V4BI, V2BI, and V1BI. (And possibly BI > >for scalars.) > > Just to be clear, what do you mean by “nearest SVE mode” in this context? > I think he means the smallest SVE mode that has the same unit size as the Adv. SIMD register. I think the idea is that we're consistent with the modes used so we don't end up using e.g. VNx16QI and VNx8QI etc for e.g. b0. > > > > > The problem with (1) is that, as Tamar pointed out, it doesn't work > > properly with reductions. It also isn't safe for this patch (without > > fast-mathy options) because of FP exceptions. Although writing to > > a scalar FP register zeros the upper bits, and so gives a "safe" value > > for this particular operation, nothing guarantees that all SF and DF > > values have this zero-extended form. They could come from subregs of > > Advanced SIMD or SVE vectors. The ABI also doesn't guarantee that > > incoming SF and DF values are zero-extended. > > > > (2) would be safe, but would mean that we continue to have an nunits > > disagreement between the data mode and the predicate mode. This would > > prevent operations being described in generic RTL in future. > > > > (3) is probably the cleanest representional approach, but has the problem > > that we cannot store a fixed-length portion of an SVE predicate. > > We would have to load and store the modes via other register classes. > > (With PMOV we could use scalar FP loads and stores, but otherwise > > we'd probably need secondary memory reloads.) That said, we could > > tell the RA to spill in a full predicate mode, so this shouldn't be > > a problem unless the modes somehow get exposed to gimple or frontends. > > > > WDYT? > > IMO option (2) sounds the more appealing at this stage. To me it feels > conceptually straightforward as we are using a SVE operation clamped at > 128 bits to “emulate” what should have been an 128-bit fixed-width mode > operation. > It also feels that, given the complexity of (3) and introducing new modes, > we should go for (3) only if/when we do decide to implement these ops with > generic RTL. 2 i
[PATCH] RISC-V/libgcc: Fix incorrect .cfi_offset for saving ra in __riscv_save_[0-3] on ilp32e.
0001-RISC-V-libgcc-Fix-incorrect-.cfi_offset-for-saving-r.patch Description: Binary data
Fwd: [patch, fortran] Implement maxloc and minloc for unsigned
Hello world, the original messages seems to have been rejected because the patch was too big. The patch (wich was not rejected for fortran@) can be found at https://gcc.gnu.org/pipermail/fortran/2024-October/061127.html Weitergeleitete Nachricht Betreff: [patch, fortran] Implement maxloc and minloc for unsigned Datum: Fri, 4 Oct 2024 09:54:37 +0200 Von: Thomas Koenig An: fort...@gcc.gnu.org , gcc-patches Hello world, please find attached the patch for implementing MAXLOC and MINLOC for unsigned. The patch is rather lengthy, but mostly due to combinatorial explosion with the different return values. Next time we update the ABI, we should treat MAXLOC and MINLOC like we already do for FINDLOC - have one version in the library, and convert in the front end when the user requests a different integer kind. This finishes the support of all reasonable intrinsics for UNSIGNED (or so I think - if anybody spots something reasonable, just let me know). The next step would then be ISO_C_BINDING; clean interfaces to C is one of the main reason why people want UNSIGNED in Fortran. Regression-tested. OK for trunk? Best regards Thomas gcc/fortran/ChangeLog: * check.cc (gfc_check_minloc_maxloc): Handle BT_UNSIGNED. * trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Likewise. * gfortran.texi: Document MAXLOC and MINLOC for UNSIGNED. libgfortran/ChangeLog: * Makefile.am: Add files for unsigned MINLOC and MAXLOC. * Makefile.in: Regenerated. * gfortran.map: Add files for unsigned MINLOC and MAXLOC. * generated/maxloc0_16_m1.c: New file. * generated/maxloc0_16_m16.c: New file. * generated/maxloc0_16_m2.c: New file. * generated/maxloc0_16_m4.c: New file. * generated/maxloc0_16_m8.c: New file. * generated/maxloc0_4_m1.c: New file. * generated/maxloc0_4_m16.c: New file. * generated/maxloc0_4_m2.c: New file. * generated/maxloc0_4_m4.c: New file. * generated/maxloc0_4_m8.c: New file. * generated/maxloc0_8_m1.c: New file. * generated/maxloc0_8_m16.c: New file. * generated/maxloc0_8_m2.c: New file. * generated/maxloc0_8_m4.c: New file. * generated/maxloc0_8_m8.c: New file. * generated/maxloc1_16_m1.c: New file. * generated/maxloc1_16_m2.c: New file. * generated/maxloc1_16_m4.c: New file. * generated/maxloc1_16_m8.c: New file. * generated/maxloc1_4_m1.c: New file. * generated/maxloc1_4_m16.c: New file. * generated/maxloc1_4_m2.c: New file. * generated/maxloc1_4_m4.c: New file. * generated/maxloc1_4_m8.c: New file. * generated/maxloc1_8_m1.c: New file. * generated/maxloc1_8_m16.c: New file. * generated/maxloc1_8_m2.c: New file. * generated/maxloc1_8_m4.c: New file. * generated/maxloc1_8_m8.c: New file. * generated/minloc0_16_m1.c: New file. * generated/minloc0_16_m16.c: New file. * generated/minloc0_16_m2.c: New file. * generated/minloc0_16_m4.c: New file. * generated/minloc0_16_m8.c: New file. * generated/minloc0_4_m1.c: New file. * generated/minloc0_4_m16.c: New file. * generated/minloc0_4_m2.c: New file. * generated/minloc0_4_m4.c: New file. * generated/minloc0_4_m8.c: New file. * generated/minloc0_8_m1.c: New file. * generated/minloc0_8_m16.c: New file. * generated/minloc0_8_m2.c: New file. * generated/minloc0_8_m4.c: New file. * generated/minloc0_8_m8.c: New file. * generated/minloc1_16_m1.c: New file. * generated/minloc1_16_m16.c: New file. * generated/minloc1_16_m2.c: New file. * generated/minloc1_16_m4.c: New file. * generated/minloc1_16_m8.c: New file. * generated/minloc1_4_m1.c: New file. * generated/minloc1_4_m16.c: New file. * generated/minloc1_4_m2.c: New file. * generated/minloc1_4_m4.c: New file. * generated/minloc1_4_m8.c: New file. * generated/minloc1_8_m1.c: New file. * generated/minloc1_8_m16.c: New file. * generated/minloc1_8_m2.c: New file. * generated/minloc1_8_m4.c: New file. * generated/minloc1_8_m8.c: New file.
[PATCH] libstdc++: Test 17_intro/names.cc with -D_FORTIFY_SOURCE=2 [PR116210]
This doesn't really belong in our testsuite, because the sole purpose of the new test is to find bugs in the Glibc wrappers (like the one linked below). But maybe it's a kindness to do it in our testsuite, because we already have this test in place, and one Glibc bug was already found thanks to Sam running the existing test with _FORTIFY_SOURCE defined. Should we do this? -- >8 -- Add a new testcase that repeats 17_intro/names.cc but with _FORTIFY_SOURCE defined, to find problems in Glibc fortify wrappers like https://sourceware.org/bugzilla/show_bug.cgi?id=32052 (which is fixed now). libstdc++-v3/ChangeLog: PR libstdc++/116210 * testsuite/17_intro/names.cc (sz): Undef for versions of Glibc that use it in the fortify wrappers. * testsuite/17_intro/names_fortify.cc: New test. --- libstdc++-v3/testsuite/17_intro/names.cc | 7 +++ libstdc++-v3/testsuite/17_intro/names_fortify.cc | 6 ++ 2 files changed, 13 insertions(+) create mode 100644 libstdc++-v3/testsuite/17_intro/names_fortify.cc diff --git a/libstdc++-v3/testsuite/17_intro/names.cc b/libstdc++-v3/testsuite/17_intro/names.cc index 6b9a3639aad..bbf45b93dee 100644 --- a/libstdc++-v3/testsuite/17_intro/names.cc +++ b/libstdc++-v3/testsuite/17_intro/names.cc @@ -377,4 +377,11 @@ #undef y #endif +#if defined __GLIBC_PREREQ && defined _FORTIFY_SOURCE +# if __GLIBC_PREREQ(2,35) && ! __GLIBC_PREREQ(2,41) +// https://sourceware.org/bugzilla/show_bug.cgi?id=32052 +# undef sz +# endif +#endif + #include diff --git a/libstdc++-v3/testsuite/17_intro/names_fortify.cc b/libstdc++-v3/testsuite/17_intro/names_fortify.cc new file mode 100644 index 000..c975412074b --- /dev/null +++ b/libstdc++-v3/testsuite/17_intro/names_fortify.cc @@ -0,0 +1,6 @@ +// { dg-do compile { target *-*-linux* } } +// { dg-add-options no_pch } + +#define _FORTIFY_SOURCE 2 +// Now we can define the macros to poison uses of non-reserved names: +#include "names.cc" -- 2.46.1
[patch,avr] Implement TARGET_FLOATN_MODE
This patch implements TARGET_FLOATN_MODE which maps _Float32[x] to SFmode and _Float64[x] to DFmode. There is currently no library support for extended float types, but these settings are more reasonable for avr (and they make more tests pass). Ok for trunk? Johann -- AVR: Implement TARGET_FLOATN_MODE. gcc/ * config/avr/avr.cc (avr_floatn_mode): New static function. (TARGET_FLOATN_MODE): New define.diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc index 92013c3845d..b73c251b64b 100644 --- a/gcc/config/avr/avr.cc +++ b/gcc/config/avr/avr.cc @@ -15473,6 +15473,24 @@ avr_c_mode_for_floating_type (tree_index ti) } +/* Implement `TARGET_FLOATN_MODE'. */ + +static opt_scalar_float_mode +avr_floatn_mode (int n, bool /*extended*/) +{ + if (n == 32) +return SFmode; + + // Notice that -m[long-]double= just tells which library (AVR-LibC + // or libgcc/libf7) is providing symbols like sin. DFmode support + // is provided by libf7 no matter what. + if (n == 64) +return DFmode; + + return opt_scalar_float_mode (); +} + + /* Worker function for `FLOAT_LIB_COMPARE_RETURNS_BOOL'. */ bool @@ -15705,6 +15723,9 @@ avr_use_lra_p () #undef TARGET_C_MODE_FOR_FLOATING_TYPE #define TARGET_C_MODE_FOR_FLOATING_TYPE avr_c_mode_for_floating_type +#undef TARGET_FLOATN_MODE +#define TARGET_FLOATN_MODE avr_floatn_mode + gcc_target targetm = TARGET_INITIALIZER;
Re: [RFC PATCH] ARM: thumb1: fix bad code emitted when HI_REGS involved
Hi! On Mon, 8 Jul 2024 at 10:57, Siarhei Volkau wrote: > > ping > > чт, 20 июн. 2024 г. в 12:09, Siarhei Volkau : > > > > This patch deals with consequences but not the root cause though. > > > > There are 5 cases which are subjects to rewrite: > > case #1: > > mov ip, r1 > > add r2, ip > > # ip is dead here > > can be rewritten as: > > adds r2, r1 Why replace 'add' with 'adds' ? Thanks, Christophe > > > > case #2: > > add ip, r1 > > mov r1, ip > > # ip is dead here > > can be rewritten as: > > add r1, ip > > > > case #3: > > mov ip, r1 > > add r2, ip > > add r3, ip > > # ip is dead here > > can be rewritten as: > > adds r2, r1 > > adds r3, r1 > > > > case #4: > > mov ip, r1 > > add ip, r2 > > mov r1, ip > > can be rewritten as: > > adds r1, r2 > > mov ip, r1 <- might be eliminated too, if ip is dead > > > > case #5 (arbitrary): > > mov r1, ip > > subs r2, r1, r2 > > mov ip, r2 > > # r1 is dead here > > can be rewritten as: > > rsbs r1, r2, #0 > > add ip, r1 > > movs r2, ip <- might be eliminated, if r2 is dead > > > > Speed profit wasn't checked but size changes are the following: > >libgcc: -132 bytes / -0.25% > > libc: -1262 bytes / -0.55% > > libm: -384 bytes / -0.42% > > libstdc++: -2258 bytes / -0.30% > > > > No tests provided because its hard to force GCC to emit HI_REGS > > in a small and straightforward function. > > > > Signed-off-by: Siarhei Volkau > > --- > > gcc/config/arm/thumb1.md | 93 +++- > > 1 file changed, 92 insertions(+), 1 deletion(-) > > > > diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md > > index d7074b43f60..9da4af9eccd 100644 > > --- a/gcc/config/arm/thumb1.md > > +++ b/gcc/config/arm/thumb1.md > > @@ -2055,4 +2055,95 @@ (define_insn "thumb1_stack_protect_test_insn" > > (set_attr "conds" "clob") > > (set_attr "type" "multiple")] > > ) > > - > > + > > +;; bad code emitted when HI_REGS involved in addition > > +;; subtract also might happen rarely > > + > > +;; case #1: > > +;; mov ip, r1 > > +;; add r2, ip # ip is dead after that > > +(define_peephole2 > > + [(set (match_operand:SI 0 "register_operand" "") > > + (match_operand:SI 1 "register_operand" "")) > > + (set (match_operand:SI 2 "register_operand" "") > > + (plus:SI (match_dup 2) (match_dup 0)))] > > + "TARGET_THUMB1 > > +&& peep2_reg_dead_p (2, operands[0]) > > +&& REGNO_REG_CLASS (REGNO (operands[0])) == HI_REGS" > > + [(set (match_dup 2) > > + (plus:SI (match_dup 2) (match_dup 1)))] > > + "") > > + > > +;; case #2: > > +;; add ip, r1 > > +;; mov r1, ip # ip is dead after that > > +(define_peephole2 > > + [(set (match_operand:SI 0 "register_operand" "") > > + (plus:SI (match_dup 0) (match_operand:SI 1 "register_operand" ""))) > > + (set (match_dup 1) (match_dup 0))] > > + "TARGET_THUMB1 > > +&& peep2_reg_dead_p (2, operands[0]) > > +&& REGNO_REG_CLASS (REGNO (operands[0])) == HI_REGS" > > + [(set (match_dup 1) > > + (plus:SI (match_dup 1) (match_dup 0)))] > > + "") > > + > > +;; case #3: > > +;; mov ip, r1 > > +;; add r2, ip > > +;; add r3, ip # ip is dead after that > > +(define_peephole2 > > + [(set (match_operand:SI 0 "register_operand" "") > > + (match_operand:SI 1 "register_operand" "")) > > + (set (match_operand:SI 2 "register_operand" "") > > + (plus:SI (match_dup 2) (match_dup 0))) > > + (set (match_operand:SI 3 "register_operand" "") > > + (plus:SI (match_dup 3) (match_dup 0)))] > > + "TARGET_THUMB1 > > +&& peep2_reg_dead_p (3, operands[0]) > > +&& REGNO_REG_CLASS (REGNO (operands[0])) == HI_REGS" > > + [(set (match_dup 2) > > + (plus:SI (match_dup 2) (match_dup 1))) > > + (set (match_dup 3) > > + (plus:SI (match_dup 3) (match_dup 1)))] > > + "") > > + > > +;; case #4: > > +;; mov ip, r1 > > +;; add ip, r2 > > +;; mov r1, ip > > +(define_peephole2 > > + [(set (match_operand:SI 0 "register_operand" "") > > + (match_operand:SI 1 "register_operand" "")) > > + (set (match_dup 0) > > + (plus:SI (match_dup 0) (match_operand:SI 2 "register_operand" ""))) > > + (set (match_dup 1) > > + (match_dup 0))] > > + "TARGET_THUMB1 > > +&& REGNO_REG_CLASS (REGNO (operands[0])) == HI_REGS" > > + [(set (match_dup 1) > > + (plus:SI (match_dup 1) (match_dup 2))) > > + (set (match_dup 0) (match_dup 1))] ;; likely will be eliminated > > + "") > > + > > +;; case #5: > > +;; mov r1, ip > > +;; subs r2, r1, r2 > > +;; mov ip, r2 # r1 is dead after > > +(define_peephole2 > > + [(set (match_operand:SI 1 "register_operand" "") > > + (match_operand:SI 0 "register_operand" "")) > > + (set (match_operand:SI 2 "register_operand" "") > > +(minus:SI (match_dup 1) (match_dup 2))) > > + (set (match_dup 0) > > + (match_dup 2))] > > + "TARGET_THUMB1 > > +&& peep2_reg_dead_p (3, operands[1]) > > +&& REGNO_REG_CLASS (REGNO (operan
Re: [PATCH] libstdc++: Test 17_intro/names.cc with -D_FORTIFY_SOURCE=2 [PR116210]
On 2024-10-04 07:52, Jonathan Wakely wrote: This doesn't really belong in our testsuite, because the sole purpose of the new test is to find bugs in the Glibc wrappers (like the one linked below). But maybe it's a kindness to do it in our testsuite, because we already have this test in place, and one Glibc bug was already found thanks to Sam running the existing test with _FORTIFY_SOURCE defined. Should we do this? -- >8 -- Add a new testcase that repeats 17_intro/names.cc but with _FORTIFY_SOURCE defined, to find problems in Glibc fortify wrappers like https://sourceware.org/bugzilla/show_bug.cgi?id=32052 (which is fixed now). libstdc++-v3/ChangeLog: PR libstdc++/116210 * testsuite/17_intro/names.cc (sz): Undef for versions of Glibc that use it in the fortify wrappers. * testsuite/17_intro/names_fortify.cc: New test. --- libstdc++-v3/testsuite/17_intro/names.cc | 7 +++ libstdc++-v3/testsuite/17_intro/names_fortify.cc | 6 ++ 2 files changed, 13 insertions(+) create mode 100644 libstdc++-v3/testsuite/17_intro/names_fortify.cc diff --git a/libstdc++-v3/testsuite/17_intro/names.cc b/libstdc++-v3/testsuite/17_intro/names.cc index 6b9a3639aad..bbf45b93dee 100644 --- a/libstdc++-v3/testsuite/17_intro/names.cc +++ b/libstdc++-v3/testsuite/17_intro/names.cc @@ -377,4 +377,11 @@ #undef y #endif +#if defined __GLIBC_PREREQ && defined _FORTIFY_SOURCE +# if __GLIBC_PREREQ(2,35) && ! __GLIBC_PREREQ(2,41) +// https://sourceware.org/bugzilla/show_bug.cgi?id=32052 +# undef sz +# endif +#endif We've backported the fix to stable branches, so the version check isn't really that reliable. Sid
Re: [PATCH] RISC-V/libgcc: Fix incorrect .cfi_offset for saving ra in __riscv_save_[0-3] on ilp32e.
On 10/4/24 1:23 AM, Tsung Chun Lin wrote: 0001-RISC-V-libgcc-Fix-incorrect-.cfi_offset-for-saving-r.patch From 8b3c5ebe8aacbcc4ddf1be8dea9a555e7e1bcc39 Mon Sep 17 00:00:00 2001 From: Jim Lin Date: Fri, 4 Oct 2024 14:48:12 +0800 Subject: [PATCH] RISC-V/libgcc: Fix incorrect .cfi_offset for saving ra in __riscv_save_[0-3] on ilp32e. libgcc/ChangeLog: * config/riscv/save-restore.S: Fix .cfi_offset for saving ra in __riscv_save_[0-3] on ilp32e. Thanks. Looks correct to me and I've pushed it to the trunk. I checked all the other .cfi_offsets and they looked correct to me. Curious, how did you find this (and the other error you fixed recently)? Jeff
Re: [PATCH 1/2] gcc: make Valgrind errors fatal during bootstrap
On 10/2/24 8:39 PM, Sam James wrote: Valgrind doesn't error out by default which means bootstrap issues like in PR116945 can easily be missed: pass --exit-errorcode=1 to handle this. While here, also set --trace-children=yes to cover child processes of tools invoked during the build. Note that this only handles tools invoke during the build, it doesn't cover everything that --enable-checking=valgrind does. gcc/ChangeLog: PR other/116945 PR other/116947 * configure: Regenerate. * configure.ac (valgrind_cmd): Pass additional options. But is this going to cause all bootstraps with Ada to fail? That's how I read 116945 which was closed as WONTFIX. Or am I mis-interpreting that BZ and its interaction with this patch? jeff
Re: [PATCH 3/3] gimple: Add gimple_with_undefined_signed_overflow and use it [PR111276]
On Thu, Oct 3, 2024 at 6:09 PM Andrew Pinski wrote: > > While looking into the ifcombine, I noticed that rewrite_to_defined_overflow > was rewriting already defined code. In the previous attempt at fixing this, > the review mentioned we should not be calling rewrite_to_defined_overflow > in those cases. The places which called rewrite_to_defined_overflow didn't > always check the lhs of the assignment. This fixes the problem by > introducing a helper function which is to be used before calling > rewrite_to_defined_overflow. > > Bootstrapped and tested on x86_64-linux-gnu. OK. Thanks, Richard. > gcc/ChangeLog: > > PR tree-optimization/111276 > * gimple-fold.cc (arith_code_with_undefined_signed_overflow): Make > static. > (gimple_with_undefined_signed_overflow): New function. > * gimple-fold.h (arith_code_with_undefined_signed_overflow): Remove. > (gimple_with_undefined_signed_overflow): Add declaration. > * tree-if-conv.cc (if_convertible_gimple_assign_stmt_p): Use > gimple_with_undefined_signed_overflow instead of manually > checking lhs and the code of the stmt. > (predicate_statements): Likewise. > * tree-ssa-ifcombine.cc (pass_tree_ifcombine::execute): Likewise. > * tree-ssa-loop-im.cc (move_computations_worker): Likewise. > * tree-ssa-reassoc.cc (update_range_test): Likewise. Reformat. > * tree-scalar-evolution.cc (final_value_replacement_loop): Use > gimple_with_undefined_signed_overflow instead of > arith_code_with_undefined_signed_overflow. > * tree-ssa-loop-split.cc (split_loop): Likewise. > > Signed-off-by: Andrew Pinski > --- > gcc/gimple-fold.cc | 26 ++- > gcc/gimple-fold.h| 2 +- > gcc/tree-if-conv.cc | 16 +++ > gcc/tree-scalar-evolution.cc | 5 + > gcc/tree-ssa-ifcombine.cc| 10 ++--- > gcc/tree-ssa-loop-im.cc | 6 +- > gcc/tree-ssa-loop-split.cc | 5 + > gcc/tree-ssa-reassoc.cc | 40 +++- > 8 files changed, 50 insertions(+), 60 deletions(-) > > diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc > index 942de7720fd..0b49d6754e2 100644 > --- a/gcc/gimple-fold.cc > +++ b/gcc/gimple-fold.cc > @@ -8991,7 +8991,7 @@ gimple_fold_indirect_ref (tree t) > integer types involves undefined behavior on overflow and the > operation can be expressed with unsigned arithmetic. */ > > -bool > +static bool > arith_code_with_undefined_signed_overflow (tree_code code) > { >switch (code) > @@ -9008,6 +9008,30 @@ arith_code_with_undefined_signed_overflow (tree_code > code) > } > } > > +/* Return true if STMT has an operation that operates on a signed > + integer types involves undefined behavior on overflow and the > + operation can be expressed with unsigned arithmetic. */ > + > +bool > +gimple_with_undefined_signed_overflow (gimple *stmt) > +{ > + if (!is_gimple_assign (stmt)) > +return false; > + tree lhs = gimple_assign_lhs (stmt); > + if (!lhs) > +return false; > + tree lhs_type = TREE_TYPE (lhs); > + if (!INTEGRAL_TYPE_P (lhs_type) > + && !POINTER_TYPE_P (lhs_type)) > +return false; > + if (!TYPE_OVERFLOW_UNDEFINED (lhs_type)) > +return false; > + if (!arith_code_with_undefined_signed_overflow > + (gimple_assign_rhs_code (stmt))) > +return false; > + return true; > +} > + > /* Rewrite STMT, an assignment with a signed integer or pointer arithmetic > operation that can be transformed to unsigned arithmetic by converting > its operand, carrying out the operation in the corresponding unsigned > diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h > index dc709d515a9..165325392c9 100644 > --- a/gcc/gimple-fold.h > +++ b/gcc/gimple-fold.h > @@ -59,7 +59,7 @@ extern tree gimple_get_virt_method_for_vtable > (HOST_WIDE_INT, tree, > extern tree gimple_fold_indirect_ref (tree); > extern bool gimple_fold_builtin_sprintf (gimple_stmt_iterator *); > extern bool gimple_fold_builtin_snprintf (gimple_stmt_iterator *); > -extern bool arith_code_with_undefined_signed_overflow (tree_code); > +extern bool gimple_with_undefined_signed_overflow (gimple *); > extern void rewrite_to_defined_overflow (gimple_stmt_iterator *); > extern gimple_seq rewrite_to_defined_overflow (gimple *); > extern void replace_call_with_value (gimple_stmt_iterator *, tree); > diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc > index 3b04d1e8d34..f5aa6c04fc9 100644 > --- a/gcc/tree-if-conv.cc > +++ b/gcc/tree-if-conv.cc > @@ -1067,11 +1067,7 @@ if_convertible_gimple_assign_stmt_p (gimple *stmt, > fprintf (dump_file, "tree could trap...\n"); >return false; > } > - else if ((INTEGRAL_TYPE_P (TREE_TYPE (lhs)) > - || POINTER_TYPE_P (TREE_TYPE (lhs))) > - && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (lhs)) > - && arith_code_with_undefined_signed_overflow > -
Re: [PATCH 2/3] cfgexpand: Handle scope conflicts better [PR111422]
On Thu, Oct 3, 2024 at 6:09 PM Andrew Pinski wrote: > > After fixing loop-im to do the correct overflow rewriting > for pointer types too. We end up with code like: > ``` > _9 = (unsigned long) &g; > _84 = _9 + 18446744073709551615; > _11 = _42 + _84; > _44 = (signed char *) _11; > ... > *_44 = 10; > g ={v} {CLOBBER(eos)}; > ... > n[0] = &f; > *_44 = 8; > g ={v} {CLOBBER(eos)}; > ``` > Which was not being recongized by the scope conflicts code. > This was because it only handled one level walk backs rather than multiple > ones. > This fixes it by using a work_list to avoid huge recursion and a visited > bitmape to avoid > going into an infinite loops when dealing with loops. Ick. This is now possibly an unbound walk from every use (even duplicate use!). Micro-optimizing would be restricting the INTEGRAL_TYPE_P types to ones matching pointer size. Another micro-optimization would be to track/cache whether a SSA def is based on a pointer, more optimizing to cache what pointer(s!) it is based on. There's testcases in bugzilla somewhere hard on compile-time in this code and I can imagine a trivial degenerate one to trigger the issue. Richard. > Bootstrapped and tested on x86_64-linux-gnu. > > PR tree-optimization/111422 > > gcc/ChangeLog: > > * cfgexpand.cc (add_scope_conflicts_2): Rewrite to be a full walk > of all operands and their uses. > > Signed-off-by: Andrew Pinski > --- > gcc/cfgexpand.cc | 46 +++--- > 1 file changed, 27 insertions(+), 19 deletions(-) > > diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc > index 6c1096363af..2e653d7207c 100644 > --- a/gcc/cfgexpand.cc > +++ b/gcc/cfgexpand.cc > @@ -573,32 +573,40 @@ visit_conflict (gimple *, tree op, tree, void *data) > > /* Helper function for add_scope_conflicts_1. For USE on > a stmt, if it is a SSA_NAME and in its SSA_NAME_DEF_STMT is known to be > - based on some ADDR_EXPR, invoke VISIT on that ADDR_EXPR. */ > + based on some ADDR_EXPR, invoke VISIT on that ADDR_EXPR. Also walk > + the assignments backwards as they might be based on an ADDR_EXPR. */ > > -static inline void > +static void > add_scope_conflicts_2 (tree use, bitmap work, >walk_stmt_load_store_addr_fn visit) > { > - if (TREE_CODE (use) == SSA_NAME > - && (POINTER_TYPE_P (TREE_TYPE (use)) > - || INTEGRAL_TYPE_P (TREE_TYPE (use > + auto_vec work_list; > + auto_bitmap visited_ssa_names; > + work_list.safe_push (use); > + > + while (!work_list.is_empty()) > { > - gimple *g = SSA_NAME_DEF_STMT (use); > - if (gassign *a = dyn_cast (g)) > + use = work_list.pop(); > + if (!use) > + continue; > + if (TREE_CODE (use) == ADDR_EXPR) > + visit (nullptr, TREE_OPERAND (use, 0), use, work); > + else if (TREE_CODE (use) == SSA_NAME > + && (POINTER_TYPE_P (TREE_TYPE (use)) > + || INTEGRAL_TYPE_P (TREE_TYPE (use > { > - if (tree op = gimple_assign_rhs1 (a)) > - if (TREE_CODE (op) == ADDR_EXPR) > - visit (a, TREE_OPERAND (op, 0), op, work); > + gimple *g = SSA_NAME_DEF_STMT (use); > + if (!bitmap_set_bit (visited_ssa_names, SSA_NAME_VERSION(use))) > + continue; > + if (gassign *a = dyn_cast (g)) > + { > + for (unsigned i = 1; i < gimple_num_ops (g); i++) > + work_list.safe_push (gimple_op (a, i)); > + } > + else if (gphi *p = dyn_cast (g)) > + for (unsigned i = 0; i < gimple_phi_num_args (p); ++i) > + work_list.safe_push (gimple_phi_arg_def (p, i)); > } > - else if (gphi *p = dyn_cast (g)) > - for (unsigned i = 0; i < gimple_phi_num_args (p); ++i) > - if (TREE_CODE (use = gimple_phi_arg_def (p, i)) == SSA_NAME) > - if (gassign *a = dyn_cast (SSA_NAME_DEF_STMT (use))) > - { > - if (tree op = gimple_assign_rhs1 (a)) > - if (TREE_CODE (op) == ADDR_EXPR) > - visit (a, TREE_OPERAND (op, 0), op, work); > - } > } > } > > -- > 2.34.1 >
Re: [PATCH 1/3] aarch64: libgcc: Cleanup warnings in lse.S
> On 3 Oct 2024, at 21:44, Christophe Lyon wrote: > > External email: Use caution opening links or attachments > > > Since > Commit c608ada288ced0268c1fd4136f56c34b24d4 > Author: Zac Walker > CommitDate: 2024-01-23 15:32:30 + > > Ifdef `.hidden`, `.type`, and `.size` pseudo-ops for `aarch64-w64-mingw32` > target > > lse.S includes aarch64-asm.h, leading to a conflicting definition of macro > 'L': > - in lse.S it expands to either '' or 'L' > - in aarch64-asm.h it is used to generate .L ## label > > lse.S does not use the second, so this patch just undefines L after > the inclusion of aarch64-asm.h. Ok. Thanks, Kyrill > > libgcc/ >* config/aarch64/lse.S: Undefine L() macro. > --- > libgcc/config/aarch64/lse.S | 4 > 1 file changed, 4 insertions(+) > > diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S > index ecef47086c6..77b3dc5a981 100644 > --- a/libgcc/config/aarch64/lse.S > +++ b/libgcc/config/aarch64/lse.S > @@ -54,6 +54,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively. > If not, see > #include "aarch64-asm.h" > #include "auto-target.h" > > +/* L is defined in aarch64-asm.h for a different purpose than why we > + use it here. */ > +#undef L > + > /* Tell the assembler to accept LSE instructions. */ > #ifdef HAVE_AS_LSE >.arch armv8-a+lse > -- > 2.34.1 >
[COMMITTED 1/2] testsuite: add missing braces around dejagnu directives
gcc/testsuite/ChangeLog: * c-c++-common/analyzer/flex-without-call-summaries.c: Add missing brace. * c-c++-common/analyzer/malloc-callbacks.c: Ditto. * gcc.dg/Wstringop-overflow-79.c: Ditto. * gcc.dg/Wstringop-overflow-80.c: Ditto. --- .../analyzer/flex-without-call-summaries.c| 2 +- .../c-c++-common/analyzer/malloc-callbacks.c | 2 +- gcc/testsuite/gcc.dg/Wstringop-overflow-79.c | 28 +-- gcc/testsuite/gcc.dg/Wstringop-overflow-80.c | 28 +-- 4 files changed, 30 insertions(+), 30 deletions(-) diff --git a/gcc/testsuite/c-c++-common/analyzer/flex-without-call-summaries.c b/gcc/testsuite/c-c++-common/analyzer/flex-without-call-summaries.c index 092d78486219..e68ac2f3b749 100644 --- a/gcc/testsuite/c-c++-common/analyzer/flex-without-call-summaries.c +++ b/gcc/testsuite/c-c++-common/analyzer/flex-without-call-summaries.c @@ -889,7 +889,7 @@ static int yy_get_next_buffer (void) } else /* Can't grow it, we don't own it. */ - b->yy_ch_buf = NULL; /* { dg-bogus "leak" "PR analyzer/103546" */ + b->yy_ch_buf = NULL; /* { dg-bogus "leak" "PR analyzer/103546" } */ if ( ! b->yy_ch_buf ) YY_FATAL_ERROR( diff --git a/gcc/testsuite/c-c++-common/analyzer/malloc-callbacks.c b/gcc/testsuite/c-c++-common/analyzer/malloc-callbacks.c index 0ba4f3824c62..422b40373634 100644 --- a/gcc/testsuite/c-c++-common/analyzer/malloc-callbacks.c +++ b/gcc/testsuite/c-c++-common/analyzer/malloc-callbacks.c @@ -64,7 +64,7 @@ void test_5 (void) { allocator_t alloc_fn = get_alloca (); deallocator_t dealloc_fn = get_free (); - int *ptr = (int *) alloc_fn (sizeof (int)); /* dg-message "region created on stack here" } */ + int *ptr = (int *) alloc_fn (sizeof (int)); /* { dg-message "region created on stack here" } */ dealloc_fn (ptr); /* { dg-warning "'free' of 'ptr' which points to memory on the stack" } */ } diff --git a/gcc/testsuite/gcc.dg/Wstringop-overflow-79.c b/gcc/testsuite/gcc.dg/Wstringop-overflow-79.c index 15eb26fbdb73..e97cb91ba18d 100644 --- a/gcc/testsuite/gcc.dg/Wstringop-overflow-79.c +++ b/gcc/testsuite/gcc.dg/Wstringop-overflow-79.c @@ -5,8 +5,8 @@ { dg-do compile } { dg-options "-O0 -Wno-array-bounds" } */ -extern char a[8]; // dg-message at offset \\\[3, 6] into destination object 'a'" "note 1" } - // dg-message at offset \\\[5, 8] into destination object 'a'" "note 2" { target *-*-* } .-1 } +extern char a[8]; // { dg-message "at offset \\\[3, 6] into destination object 'a'" "note 1" } + // { dg-message "at offset \\\[5, 8] into destination object 'a'" "note 2" { target *-*-* } .-1 } void test_2_notes (int i) { @@ -15,9 +15,9 @@ void test_2_notes (int i) } -extern char b[8]; // dg-message at offset \\\[3, 6] into destination object 'b'" "note 1" } - // dg-message at offset \\\[4, 7] into destination object 'b'" "note 2" { target *-*-* } .-1 } - // dg-message at offset \\\[5, 8] into destination object 'b'" "note 3" { target *-*-* } .-2 } +extern char b[8]; // { dg-message "at offset \\\[3, 6] into destination object 'b'" "note 1" } + // { dg-message "at offset \\\[4, 7] into destination object 'b'" "note 2" { target *-*-* } .-1 } + // { dg-message "at offset \\\[5, 8] into destination object 'b'" "note 3" { target *-*-* } .-2 } void test_3_notes (int i) { @@ -26,10 +26,10 @@ void test_3_notes (int i) } -extern char c[8]; // dg-message at offset \\\[3, 6] into destination object 'c'" "note 1" } - // dg-message at offset \\\[4, 7] into destination object 'c'" "note 2" { target *-*-* } .-1 } - // dg-message at offset \\\[5, 8] into destination object 'c'" "note 3" { target *-*-* } .-2 } - // dg-message at offset \\\[6, 8] into destination object 'c'" "note 3" { target *-*-* } .-2 } +extern char c[8]; // { dg-message "at offset \\\[3, 6] into destination object 'c'" "note 1" } + // { dg-message "at offset \\\[4, 7] into destination object 'c'" "note 2" { target *-*-* } .-1 } + // { dg-message "at offset \\\[5, 8] into destination object 'c'" "note 3" { target *-*-* } .-2 } + // { dg-message "at offset \\\[6, 8] into destination object 'c'" "note 3" { target *-*-* } .-2 } void test_4_notes (int i) { @@ -47,11 +47,11 @@ void test_4_notes (int i) } -extern char d[8]; // dg-me
[COMMITTED 2/2] testsuite: fix two newly-running -Wstringop-overflow test directives
This didn't show up until the previous commit which fixed the directive syntax. The indexing was off for the notes. gcc/testsuite/ChangeLog: * gcc.dg/Wstringop-overflow-79.c: Fix index for notes. * gcc.dg/Wstringop-overflow-80.c: Ditto. --- gcc/testsuite/gcc.dg/Wstringop-overflow-79.c | 6 +++--- gcc/testsuite/gcc.dg/Wstringop-overflow-80.c | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/gcc/testsuite/gcc.dg/Wstringop-overflow-79.c b/gcc/testsuite/gcc.dg/Wstringop-overflow-79.c index e97cb91ba18d..87bf775c0b2b 100644 --- a/gcc/testsuite/gcc.dg/Wstringop-overflow-79.c +++ b/gcc/testsuite/gcc.dg/Wstringop-overflow-79.c @@ -29,7 +29,7 @@ void test_3_notes (int i) extern char c[8]; // { dg-message "at offset \\\[3, 6] into destination object 'c'" "note 1" } // { dg-message "at offset \\\[4, 7] into destination object 'c'" "note 2" { target *-*-* } .-1 } // { dg-message "at offset \\\[5, 8] into destination object 'c'" "note 3" { target *-*-* } .-2 } - // { dg-message "at offset \\\[6, 8] into destination object 'c'" "note 3" { target *-*-* } .-2 } + // { dg-message "at offset \\\[6, 8] into destination object 'c'" "note 4" { target *-*-* } .-3 } void test_4_notes (int i) { @@ -50,8 +50,8 @@ void test_4_notes (int i) extern char d[8]; // { dg-message "at offset \\\[3, 6] into destination object 'd'" "note 1" } // { dg-message "at offset \\\[4, 7] into destination object 'd'" "note 2" { target *-*-* } .-1 } // { dg-message "at offset \\\[5, 8] into destination object 'd'" "note 3" { target *-*-* } .-2 } - // { dg-message "at offset \\\[6, 8] into destination object 'd'" "note 3" { target *-*-* } .-3 } - // { dg-message "at offset \\\[7, 8] into destination object 'd'" "note 3" { target *-*-* } .-4 } + // { dg-message "at offset \\\[6, 8] into destination object 'd'" "note 4" { target *-*-* } .-3 } + // { dg-message "at offset \\\[7, 8] into destination object 'd'" "note 5" { target *-*-* } .-4 } void test_5_notes (int i) { diff --git a/gcc/testsuite/gcc.dg/Wstringop-overflow-80.c b/gcc/testsuite/gcc.dg/Wstringop-overflow-80.c index c74ca3a7918b..f49b5ffc636b 100644 --- a/gcc/testsuite/gcc.dg/Wstringop-overflow-80.c +++ b/gcc/testsuite/gcc.dg/Wstringop-overflow-80.c @@ -29,7 +29,7 @@ void test_3_notes (int i) extern char c[8]; // { dg-message "at offset \\\[3, 6] into destination object 'c'" "note 1" } // { dg-message "at offset \\\[4, 7] into destination object 'c'" "note 2" { target *-*-* } .-1 } // { dg-message "at offset \\\[5, 8] into destination object 'c'" "note 3" { target *-*-* } .-2 } - // { dg-message "at offset \\\[6, 8] into destination object 'c'" "note 3" { target *-*-* } .-2 } + // { dg-message "at offset \\\[6, 8] into destination object 'c'" "note 4" { target *-*-* } .-3 } void test_4_notes (int i) { @@ -50,8 +50,8 @@ void test_4_notes (int i) extern char d[8]; // { dg-message "at offset \\\[3, 6] into destination object 'd'" "note 1" } // { dg-message "at offset \\\[4, 7] into destination object 'd'" "note 2" { target *-*-* } .-1 } // { dg-message "at offset \\\[5, 8] into destination object 'd'" "note 3" { target *-*-* } .-2 } - // { dg-message "at offset \\\[6, 8] into destination object 'd'" "note 3" { target *-*-* } .-3 } - // { dg-message "at offset \\\[7, 8] into destination object 'd'" "note 3" { target *-*-* } .-4 } + // { dg-message "at offset \\\[6, 8] into destination object 'd'" "note 4" { target *-*-* } .-3 } + // { dg-message "at offset \\\[7, 8] into destination object 'd'" "note 5" { target *-*-* } .-4 } void test_5_notes (int i) { -- 2.46.2
Re: [PATCH 3/3] aarch64: libgcc: Add -Werror support
On Fri, 4 Oct 2024 at 10:50, Kyrylo Tkachov wrote: > > > > > On 3 Oct 2024, at 21:44, Christophe Lyon wrote: > > > > External email: Use caution opening links or attachments > > > > > > When --enable-werror is enabled when running the top-level configure, > > it passes --enable-werror-always to subdirs. Some of them, like > > libgcc, ignore it. > > > > This patch adds support for it, enabled only for aarch64, to avoid > > breaking bootstrap for other targets. > > > > The aarch64 part is ok but you’ll need a wider libgcc approval. > It seems to me that if libgcc is intended to compile cleanly with -Werror > then it should be a libgcc-wide change, but maybe doing it port-by-port is > the only practical way of getting there? Indeed, it was not clear to me if libgcc is supposed to compile without warnings My feeling is that warnings or often worth having a look, but without -Werror they get unnoticed. Adding Ian in cc as libgcc maintainer. Thanks, Christophe > Thanks, > Kyrill > > > > The patch also adds -Wno-prio-ctor-dtor to avoid a warning when compiling > > lse_init.c > > > >libgcc/ > >* Makefile.in (WERROR): New. > >* config/aarch64/t-aarch64: Handle WERROR. Always use > >-Wno-prio-ctor-dtor. > >* configure.ac: Add support for --enable-werror-always. > >* configure: Regenerate. > > --- > > libgcc/Makefile.in | 1 + > > libgcc/config/aarch64/t-aarch64 | 1 + > > libgcc/configure| 31 +++ > > libgcc/configure.ac | 5 + > > 4 files changed, 38 insertions(+) > > > > diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in > > index 0e46e9ef768..eca62546642 100644 > > --- a/libgcc/Makefile.in > > +++ b/libgcc/Makefile.in > > @@ -84,6 +84,7 @@ AR_FLAGS = rc > > > > CC = @CC@ > > CFLAGS = @CFLAGS@ > > +WERROR = @WERROR@ > > RANLIB = @RANLIB@ > > LN_S = @LN_S@ > > > > diff --git a/libgcc/config/aarch64/t-aarch64 > > b/libgcc/config/aarch64/t-aarch64 > > index b70e7b94edd..ae1588ce307 100644 > > --- a/libgcc/config/aarch64/t-aarch64 > > +++ b/libgcc/config/aarch64/t-aarch64 > > @@ -30,3 +30,4 @@ LIB2ADDEH += \ > >$(srcdir)/config/aarch64/__arm_za_disable.S > > > > SHLIB_MAPFILES += $(srcdir)/config/aarch64/libgcc-sme.ver > > +LIBGCC2_CFLAGS += $(WERROR) -Wno-prio-ctor-dtor > > diff --git a/libgcc/configure b/libgcc/configure > > index cff1eff9625..ae56f7dbdc9 100755 > > --- a/libgcc/configure > > +++ b/libgcc/configure > > @@ -592,6 +592,7 @@ enable_execute_stack > > asm_hidden_op > > extra_parts > > cpu_type > > +WERROR > > get_gcc_base_ver > > HAVE_STRUB_SUPPORT > > thread_header > > @@ -719,6 +720,7 @@ enable_tm_clone_registry > > with_glibc_version > > enable_tls > > with_gcc_major_version_only > > +enable_werror_always > > ' > > ac_precious_vars='build_alias > > host_alias > > @@ -1361,6 +1363,7 @@ Optional Features: > > installations without PT_GNU_EH_FRAME support > > --disable-tm-clone-registrydisable TM clone registry > > --enable-tlsUse thread-local storage [default=yes] > > + --enable-werror-always enable -Werror despite compiler version > > > > Optional Packages: > > --with-PACKAGE[=ARG]use PACKAGE [ARG=yes] > > @@ -5808,6 +5811,34 @@ fi > > > > > > > > +# Only enable with --enable-werror-always until existing warnings are > > +# corrected. > > +ac_ext=c > > +ac_cpp='$CPP $CPPFLAGS' > > +ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' > > +ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS > > conftest.$ac_ext $LIBS >&5' > > +ac_compiler_gnu=$ac_cv_c_compiler_gnu > > + > > +WERROR= > > +# Check whether --enable-werror-always was given. > > +if test "${enable_werror_always+set}" = set; then : > > + enableval=$enable_werror_always; > > +else > > + enable_werror_always=no > > +fi > > + > > +if test $enable_werror_always = yes; then : > > + WERROR="$WERROR${WERROR:+ }-Werror" > > +fi > > + > > +ac_ext=c > > +ac_cpp='$CPP $CPPFLAGS' > > +ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' > > +ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS > > conftest.$ac_ext $LIBS >&5' > > +ac_compiler_gnu=$ac_cv_c_compiler_gnu > > + > > + > > + > > # Substitute configuration variables > > > > > > diff --git a/libgcc/configure.ac b/libgcc/configure.ac > > index 4e8c036990f..6b3ea2aea5c 100644 > > --- a/libgcc/configure.ac > > +++ b/libgcc/configure.ac > > @@ -13,6 +13,7 @@ sinclude(../config/unwind_ipinfo.m4) > > sinclude(../config/gthr.m4) > > sinclude(../config/sjlj.m4) > > sinclude(../config/cet.m4) > > +sinclude(../config/warnings.m4) > > > > AC_INIT([GNU C Runtime Library], 1.0,,[libgcc]) > > AC_CONFIG_SRCDIR([static-object.mk]) > > @@ -746,6 +747,10 @@ AC_SUBST(HAVE_STRUB_SUPPORT) > > # Determine what GCC version number to use in filesystem paths. > > GCC_BASE_VER > > > > +# Only enable with --enable-werror-always until existing warnings are > > +# corrected. > > +ACX_PRO
Re: [PATCH] RISC-V: Define LOGICAL_OP_NON_SHORT_CIRCUIT to 1 [PR116615]
On 10/4/24 12:42 AM, Richard Biener wrote: On Thu, Oct 3, 2024 at 3:15 AM Andrew Waterman wrote: On Wed, Oct 2, 2024 at 4:41 PM Jeff Law wrote: On 10/2/24 4:39 PM, Andrew Waterman wrote: On Wed, Oct 2, 2024 at 5:56 AM Jeff Law wrote: On 9/5/24 12:52 PM, Palmer Dabbelt wrote: We have cheap logical ops, so let's just move this back to the default to take advantage of the standard branch/op hueristics. gcc/ChangeLog: PR target/116615 * config/riscv/riscv.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove. So on the BPI this is a pretty clear win. Not surprisingly perlbench and gcc are the big winners. It somewhat surprisingly regresses x264, deepsjeng & leela, but the magnitudes are smaller. The net from a cycle perspective is 2.4%. Every benchmark looks better from a branch count perspective. So in my mind it's just a matter of fixing any testsuite fallout (I would expect some) and this is OK. Jeff, were you able to measure the change in static code size, too? These results are very encouraging, but I'd like to make sure we don't need to retain the current behavior when optimizing for size. Codesize is ever so slightly worse. As in less than .1%. Not worth it in my mind to do something different in that range. It probably helps code-size when not optimizing for size depending on how you align jumps. By default we aren't aligning jumps at all. The infrastructure is in place to allow uarchs to select their preferences though (we're using that infrastructure internally). jeff
Re: [PATCH] testsuite: Fix fallout of turning warnings into errors on 32-bit Arm
Hi Thiago, On Fri, 1 Mar 2024 at 15:29, Richard Earnshaw (lists) wrote: > > On 01/03/2024 14:23, Andre Vieira (lists) wrote: > > Hi Thiago, > > > > Thanks for this, LGTM but I can't approve this, CC'ing Richard. > > > > Do have a nitpick, in the gcc/testsuite/ChangeLog: remove 'gcc/testsuite' > > from bullet points 2-4. > > > > Yes, this is OK with the change Andre mentioned (your push will fail if you > don't fix that). > > R. > > PS, if you've set up GCC git customizations (see > contrib/gcc-git-customization.sh), you can verify things like this with 'git > gcc-verify HEAD^..HEAD' > ISTM you have forgotten to commit this patch. If you don't have commit rights, I can do it for you. Thanks, Christophe > > > Kind regards, > > Andre > > > > On 13/01/2024 00:55, Thiago Jung Bauermann wrote: > >> Since commits 2c3db94d9fd ("c: Turn int-conversion warnings into > >> permerrors") and 55e94561e97e ("c: Turn -Wimplicit-function-declaration > >> into a permerror") these tests fail with errors such as: > >> > >>FAIL: gcc.target/arm/pr59858.c (test for excess errors) > >>FAIL: gcc.target/arm/pr65647.c (test for excess errors) > >>FAIL: gcc.target/arm/pr65710.c (test for excess errors) > >>FAIL: gcc.target/arm/pr97969.c (test for excess errors) > >> > >> Here's one example of the excess errors: > >> > >>FAIL: gcc.target/arm/pr65647.c (test for excess errors) > >>Excess errors: > >>/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:6:17: error: > >> initialization of 'int' from 'int *' makes integer from pointer without a > >> cast [-Wint-conversion] > >>/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:6:51: error: > >> initialization of 'int' from 'int *' makes integer from pointer without a > >> cast [-Wint-conversion] > >>/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:6:62: error: > >> initialization of 'int' from 'int *' makes integer from pointer without a > >> cast [-Wint-conversion] > >>/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:7:48: error: > >> initialization of 'int' from 'int *' makes integer from pointer without a > >> cast [-Wint-conversion] > >>/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:8:9: error: > >> initialization of 'int' from 'int *' makes integer from pointer without a > >> cast [-Wint-conversion] > >>/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:24:5: error: > >> initialization of 'int' from 'int *' makes integer from pointer without a > >> cast [-Wint-conversion] > >>/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:25:5: error: > >> initialization of 'int' from 'struct S1 *' makes integer from pointer > >> without a cast [-Wint-conversion] > >>/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:41:3: error: > >> implicit declaration of function 'fn3'; did you mean 'fn2'? > >> [-Wimplicit-function-declaration] > >>/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:46:3: error: > >> implicit declaration of function 'fn5'; did you mean 'fn4'? > >> [-Wimplicit-function-declaration] > >>/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:57:16: error: > >> implicit declaration of function 'fn6'; did you mean 'fn4'? > >> [-Wimplicit-function-declaration] > >> > >> PR rtl-optimization/59858 and PR target/65710 test the fix of an ICE. > >> PR target/65647 and PR target/97969 test for a compilation infinite loop. > >> > >> Therefore, add -fpermissive so that the tests behave as they did > >> previously. > >> Tested on armv8l-linux-gnueabihf. > >> > >> gcc/testsuite/ChangeLog: > >> * gcc.target/arm/pr59858.c: Add -fpermissive. > >> * gcc/testsuite/gcc.target/arm/pr65647.c: Likewise. > >> * gcc/testsuite/gcc.target/arm/pr65710.c: Likewise. > >> * gcc/testsuite/gcc.target/arm/pr97969.c: Likewise. > >> --- > >> gcc/testsuite/gcc.target/arm/pr59858.c | 2 +- > >> gcc/testsuite/gcc.target/arm/pr65647.c | 2 +- > >> gcc/testsuite/gcc.target/arm/pr65710.c | 2 +- > >> gcc/testsuite/gcc.target/arm/pr97969.c | 2 +- > >> 4 files changed, 4 insertions(+), 4 deletions(-) > >> > >> diff --git a/gcc/testsuite/gcc.target/arm/pr59858.c > >> b/gcc/testsuite/gcc.target/arm/pr59858.c > >> index 3360b48e8586..9336edfce277 100644 > >> --- a/gcc/testsuite/gcc.target/arm/pr59858.c > >> +++ b/gcc/testsuite/gcc.target/arm/pr59858.c > >> @@ -1,5 +1,5 @@ > >> /* { dg-do compile } */ > >> -/* { dg-options "-march=armv5te -fno-builtin -mfloat-abi=soft -mthumb > >> -fno-stack-protector -Os -fno-tree-loop-optimize -fno-tree-dominator-opts > >> -fPIC -w" } */ > >> +/* { dg-options "-march=armv5te -fno-builtin -mfloat-abi=soft -mthumb > >> -fno-stack-protector -Os -fno-tree-loop-optimize -fno-tree-dominator-opts > >> -fPIC -w -fpermissive" } */ > >> /* { dg-require-effective-target fpic } */ > >> /* { dg-skip-if "Incompatible command line options: -mfloat-abi=soft > >> -mfloat-abi=hard" { *-*-* } { "-mfloat-abi=hard" } { "" } } */ > >> /* { dg-require-effective-
[PATCH] [PR116831] match.pd: Check trunc_mod vector obtap before folding.
As in https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663185.html, this patch guards the simplification x / y * y == x -> x % y == 0 in match.pd for vector types by a check for: 1) Support of the mod optab for vectors OR 2) Application before vector lowering for non-VL vectors. The patch was bootstrapped and tested with no regression on aarch64-linux-gnu and x86_64-linux-gnu. OK for mainline? Signed-off-by: Jennifer Schmitz gcc/ PR tree-optimization/116831 * match.pd: Guard simplification to trunc_mod with check for mod optab support. gcc/testsuite/ PR tree-optimization/116831 * gcc.dg/torture/pr116831.c: New test. 0001-PR116831-match.pd-Check-trunc_mod-vector-obtap-befor.patch Description: Binary data smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH] aarch64: Expand CTZ to RBIT + CLZ for SVE [PR109498]
> On 1 Oct 2024, at 6:17 PM, Richard Sandiford > wrote: > > External email: Use caution opening links or attachments > > > Soumya AR writes: >> Currently, we vectorize CTZ for SVE by using the following operation: >> .CTZ (X) = (PREC - 1) - .CLZ (X & -X) >> >> Instead, this patch expands CTZ to RBIT + CLZ for SVE, as suggested in >> PR109498. >> >> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. >> OK for mainline? >> >> Signed-off-by: Soumya AR >> >> gcc/ChangeLog: >> PR target/109498 >> * config/aarch64/aarch64-sve.md (ctz2): Added pattern to expand >>CTZ to RBIT + CLZ for SVE. >> >> gcc/testsuite/ChangeLog: >> PR target/109498 >> * gcc.target/aarch64/sve/ctz.c: New test. > > Generally looks good, but a couple of comments: > >> --- >> gcc/config/aarch64/aarch64-sve.md | 16 +++ >> gcc/testsuite/gcc.target/aarch64/sve/ctz.c | 49 ++ >> 2 files changed, 65 insertions(+) >> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/ctz.c >> >> diff --git a/gcc/config/aarch64/aarch64-sve.md >> b/gcc/config/aarch64/aarch64-sve.md >> index bfa28849adf..10094f156b3 100644 >> --- a/gcc/config/aarch64/aarch64-sve.md >> +++ b/gcc/config/aarch64/aarch64-sve.md >> @@ -3088,6 +3088,22 @@ >> ;; - NOT >> ;; - >> >> +(define_expand "ctz2" >> + [(set (match_operand:SVE_I 0 "register_operand") >> + (unspec:SVE_I >> + [(match_dup 2) >> +(ctz:SVE_I >> + (match_operand:SVE_I 1 "register_operand"))] >> + UNSPEC_PRED_X))] >> + "TARGET_SVE" >> + { >> + operands[2] = aarch64_ptrue_reg (mode); > > There's no real need to use operands[...] here. It can just be > a local variable. > >> + emit_insn (gen_aarch64_pred_rbit (operands[0], >> operands[2],operands[1])); >> + emit_insn (gen_aarch64_pred_clz (operands[0], operands[2], >> operands[0])); > > Formatting nit: C++ lines should be 80 characters or fewer. > > More importantly, I think we should use a fresh register for the > temporary (RBIT) result, since that tends to permit more optimisation. Thanks for the feedback! Attaching an updated patch with the suggested changes. Regards, Soumya > Thanks, > Richard > > > >> + DONE; >> + } >> +) >> + >> ;; Unpredicated integer unary arithmetic. >> (define_expand "2" >> [(set (match_operand:SVE_I 0 "register_operand") >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/ctz.c >> b/gcc/testsuite/gcc.target/aarch64/sve/ctz.c >> new file mode 100644 >> index 000..433a9174f48 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/ctz.c >> @@ -0,0 +1,49 @@ >> +/* { dg-final { check-function-bodies "**" "" } } */ >> +/* { dg-options "-O3 --param aarch64-autovec-preference=sve-only" } */ >> + >> +#include >> + >> +#define FUNC(FUNCTION, NAME, DTYPE) \ >> +void \ >> +NAME (DTYPE *__restrict x, DTYPE *__restrict y, int n) { \ >> + for (int i = 0; i < n; i++)\ >> +x[i] = FUNCTION (y[i]); \ >> +}\ >> + >> + >> +/* >> +** ctz_uint8: >> +** ... >> +** rbitz[0-9]+\.b, p[0-7]/m, z[0-9]+\.b >> +** clz z[0-9]+\.b, p[0-7]/m, z[0-9]+\.b >> +** ... >> +*/ >> +FUNC (__builtin_ctzg, ctz_uint8, uint8_t) >> + >> +/* >> +** ctz_uint16: >> +** ... >> +** rbitz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h >> +** clz z[0-9]+\.h, p[0-7]/m, z[0-9]+\.h >> +** ... >> +*/ >> +FUNC (__builtin_ctzg, ctz_uint16, uint16_t) >> + >> +/* >> +** ctz_uint32: >> +** ... >> +** rbitz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s >> +** clz z[0-9]+\.s, p[0-7]/m, z[0-9]+\.s >> +** ... >> +*/ >> +FUNC (__builtin_ctz, ctz_uint32, uint32_t) >> + >> +/* >> +** ctz_uint64: >> +** ... >> +** rbitz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d >> +** clz z[0-9]+\.d, p[0-7]/m, z[0-9]+\.d >> +** ... >> +*/ >> +FUNC (__builtin_ctzll, ctz_uint64, uint64_t) >> + >> -- >> 2.43.2 0001-aarch64-Expand-CTZ-to-RBIT-CLZ-for-SVE-PR109498.patch Description: 0001-aarch64-Expand-CTZ-to-RBIT-CLZ-for-SVE-PR109498.patch
[PATCH] tree-optimization/99856 - fix testcase
When making the testcase use aligned accesses I botched up the copy&paste. Fixed. Pushed. PR tree-optimization/99856 * gcc.dg/vect/pr99856.c: Fix copy&paste errors. --- gcc/testsuite/gcc.dg/vect/pr99856.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/pr99856.c b/gcc/testsuite/gcc.dg/vect/pr99856.c index e5d2a45be57..1ff20c7bc56 100644 --- a/gcc/testsuite/gcc.dg/vect/pr99856.c +++ b/gcc/testsuite/gcc.dg/vect/pr99856.c @@ -17,8 +17,8 @@ opSourceOver_premul(uint8_t* restrict Rrgba, const uint8_t* restrict Drgba, int len) { Rrgba = __builtin_assume_aligned (Rrgba, __BIGGEST_ALIGNMENT__); - Srgba = __builtin_assume_aligned (Rrgba, __BIGGEST_ALIGNMENT__); - Drgba = __builtin_assume_aligned (Rrgba, __BIGGEST_ALIGNMENT__); + Srgba = __builtin_assume_aligned (Srgba, __BIGGEST_ALIGNMENT__); + Drgba = __builtin_assume_aligned (Drgba, __BIGGEST_ALIGNMENT__); int i = 0; for (; i < len*4; i += 4) { -- 2.43.0
[committed] libstdc++: Replace implicit lambda capture of 'this' [PR116964]
Tested x86_64-linux (and tested the affected code by manually bodging the _GLIBCXX_USE_PTHREAD_RWLOCK_T macro). Pushed to trunk. -- >8 -- C++20 deprecates implicit capture of 'this', so change [=] to [this] for all lambda expressions in . This only shows up on targets where _GLIBCXX_USE_PTHREAD_RWLOCK_T is not defined, as we have an alternative implementation of shared mutexes in that case. libstdc++-v3/ChangeLog: PR libstdc++/116964 * include/std/shared_mutex (__shared_mutex_cv): Use [this] for lambda captures. (shared_timed_mutex) [!_GLIBCXX_USE_PTHREAD_RWLOCK_T]: Likewise. --- libstdc++-v3/include/std/shared_mutex | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/libstdc++-v3/include/std/shared_mutex b/libstdc++-v3/include/std/shared_mutex index 9bf98c0b040..b369a15cc60 100644 --- a/libstdc++-v3/include/std/shared_mutex +++ b/libstdc++-v3/include/std/shared_mutex @@ -332,10 +332,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { unique_lock __lk(_M_mut); // Wait until we can set the write-entered flag. - _M_gate1.wait(__lk, [=]{ return !_M_write_entered(); }); + _M_gate1.wait(__lk, [this]{ return !_M_write_entered(); }); _M_state |= _S_write_entered; // Then wait until there are no more readers. - _M_gate2.wait(__lk, [=]{ return _M_readers() == 0; }); + _M_gate2.wait(__lk, [this]{ return _M_readers() == 0; }); } bool @@ -367,7 +367,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION lock_shared() { unique_lock __lk(_M_mut); - _M_gate1.wait(__lk, [=]{ return _M_state < _S_max_readers; }); + _M_gate1.wait(__lk, [this]{ return _M_state < _S_max_readers; }); ++_M_state; } @@ -690,13 +690,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { unique_lock __lk(_M_mut); if (!_M_gate1.wait_until(__lk, __abs_time, -[=]{ return !_M_write_entered(); })) +[this]{ return !_M_write_entered(); })) { return false; } _M_state |= _S_write_entered; if (!_M_gate2.wait_until(__lk, __abs_time, -[=]{ return _M_readers() == 0; })) +[this]{ return _M_readers() == 0; })) { _M_state ^= _S_write_entered; // Wake all threads blocked while the write-entered flag was set. @@ -716,7 +716,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { unique_lock __lk(_M_mut); if (!_M_gate1.wait_until(__lk, __abs_time, -[=]{ return _M_state < _S_max_readers; })) +[this]{ return _M_state < _S_max_readers; })) { return false; } -- 2.46.1
[PATCH] Relax gcc.dg/vect/pr65947-8.c
When failing using forced SLP we do not print the non-SLP failure mode which reads slightly different. Massage the expectation a bit. Pushed. * gcc.dg/vect/pr65947-8.c: Adjust. --- gcc/testsuite/gcc.dg/vect/pr65947-8.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-8.c b/gcc/testsuite/gcc.dg/vect/pr65947-8.c index 9ced4dbb69f..827575778f8 100644 --- a/gcc/testsuite/gcc.dg/vect/pr65947-8.c +++ b/gcc/testsuite/gcc.dg/vect/pr65947-8.c @@ -43,4 +43,4 @@ main (void) /* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! { vect_fold_extract_last } } } } } */ /* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { vect_fold_extract_last } } } } */ -/* { dg-final { scan-tree-dump "multiple types in double reduction or condition reduction" "vect" { target { ! { vect_fold_extract_last } } } } } */ +/* { dg-final { scan-tree-dump "multiple types in\[^\\n\\r\]* condition reduction" "vect" { target { ! { vect_fold_extract_last } } } } } */ -- 2.43.0
[PATCH] Add single-lane SLP support to .GOMP_SIMD_LANE vectorization
The following adds basic support for single-lane SLP .GOMP_SIMD_LANE vectorization, in particular it enables SLP discovery. * tree-vect-slp.cc (no_arg_map): New. (vect_get_operand_map): Handle IFN_GOMP_SIMD_LANE. (vect_build_slp_tree_1): Likewise. * tree-vect-stmts.cc (vectorizable_call): Handle single-lane SLP for .GOMP_SIMD_LANE calls. --- gcc/tree-vect-slp.cc | 11 +++ gcc/tree-vect-stmts.cc | 27 +++ 2 files changed, 30 insertions(+), 8 deletions(-) diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 2274d0e428e..125e69cf0eb 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -507,6 +507,7 @@ static const int cond_expr_maps[3][5] = { { 4, -2, -1, 1, 2 }, { 4, -1, -2, 2, 1 } }; +static const int no_arg_map[] = { 0 }; static const int arg0_map[] = { 1, 0 }; static const int arg1_map[] = { 1, 1 }; static const int arg2_map[] = { 1, 2 }; @@ -587,6 +588,9 @@ vect_get_operand_map (const gimple *stmt, bool gather_scatter_p = false, case IFN_CTZ: return arg0_map; + case IFN_GOMP_SIMD_LANE: + return no_arg_map; + default: break; } @@ -1175,6 +1179,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, ldst_p = true; rhs_code = CFN_MASK_STORE; } + else if (cfn == CFN_GOMP_SIMD_LANE) + ; else if ((cfn != CFN_LAST && cfn != CFN_MASK_CALL && internal_fn_p (cfn) @@ -1273,6 +1279,11 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, need_same_oprnds = true; first_op1 = gimple_call_arg (call_stmt, 1); } + else if (rhs_code == CFN_GOMP_SIMD_LANE) + { + need_same_oprnds = true; + first_op1 = gimple_call_arg (call_stmt, 1); + } } else { diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 584be52f423..b5dd03b25a4 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -3392,7 +3392,7 @@ vectorizable_call (vec_info *vinfo, if (ifn == IFN_LAST && !fndecl) { if (cfn == CFN_GOMP_SIMD_LANE - && !slp_node + && (!slp_node || SLP_TREE_LANES (slp_node) == 1) && loop_vinfo && LOOP_VINFO_LOOP (loop_vinfo)->simduid && TREE_CODE (gimple_call_arg (stmt, 0)) == SSA_NAME @@ -3538,18 +3538,15 @@ vectorizable_call (vec_info *vinfo, /* Build argument list for the vectorized call. */ if (slp_node) { - vec vec_oprnds0; - + unsigned int vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node); vect_get_slp_defs (vinfo, slp_node, &vec_defs); - vec_oprnds0 = vec_defs[0]; /* Arguments are ready. Create the new vector stmt. */ - FOR_EACH_VEC_ELT (vec_oprnds0, i, vec_oprnd0) + for (i = 0; i < vec_num; ++i) { int varg = 0; if (masked_loop_p && reduc_idx >= 0) { - unsigned int vec_num = vec_oprnds0.length (); /* Always true for SLP. */ gcc_assert (ncopies == 1); vargs[varg++] = vect_get_loop_mask (loop_vinfo, @@ -3590,11 +3587,26 @@ vectorizable_call (vec_info *vinfo, vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); } + else if (cfn == CFN_GOMP_SIMD_LANE) + { + /* ??? For multi-lane SLP we'd need to build +{ 0, 0, .., 1, 1, ... }. */ + tree cst = build_index_vector (vectype_out, +i * nunits_out, 1); + tree new_var + = vect_get_new_ssa_name (vectype_out, vect_simple_var, +"cst_"); + gimple *init_stmt = gimple_build_assign (new_var, cst); + vect_init_vector_1 (vinfo, stmt_info, init_stmt, NULL); + new_temp = make_ssa_name (vec_dest); + new_stmt = gimple_build_assign (new_temp, new_var); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, + gsi); + } else { if (len_opno >= 0 && len_loop_p) { - unsigned int vec_num = vec_oprnds0.length (); /* Always true for SLP. */ gcc_assert (ncopies == 1); tree len @@ -3608,7 +3620,6 @@ vectorizable_call (vec_info *vinfo,
Re: [PATCH] libstdc++: Unroll loop in load_bytes function
On Fri, 4 Oct 2024 at 13:53, Dmitry Ilvokhin wrote: > > On Fri, Oct 04, 2024 at 10:20:27AM +0100, Jonathan Wakely wrote: > > On Fri, 4 Oct 2024 at 10:19, Jonathan Wakely wrote: > > > > > > On Fri, 4 Oct 2024 at 07:53, Richard Biener > > > wrote: > > > > > > > > On Wed, Oct 2, 2024 at 8:26 PM Jonathan Wakely > > > > wrote: > > > > > > > > > > On Wed, 2 Oct 2024 at 19:16, Jonathan Wakely > > > > > wrote: > > > > > > > > > > > > On Wed, 2 Oct 2024 at 19:15, Dmitry Ilvokhin > > > > > > wrote: > > > > > > > > > > > > > > Instead of looping over every byte of the tail, unroll loop > > > > > > > manually > > > > > > > using switch statement, then compilers (at least GCC and Clang) > > > > > > > will > > > > > > > generate a jump table [1], which is faster on a microbenchmark > > > > > > > [2]. > > > > > > > > > > > > > > [1]: https://godbolt.org/z/aE8Mq3j5G > > > > > > > [2]: https://quick-bench.com/q/ylYLW2R22AZKRvameYYtbYxag24 > > > > > > > > > > > > > > libstdc++-v3/ChangeLog: > > > > > > > > > > > > > > * libstdc++-v3/libsupc++/hash_bytes.cc (load_bytes): > > > > > > > unroll > > > > > > > loop using switch statement. > > > > > > > > > > > > > > Signed-off-by: Dmitry Ilvokhin > > > > > > > --- > > > > > > > libstdc++-v3/libsupc++/hash_bytes.cc | 27 > > > > > > > +++ > > > > > > > 1 file changed, 23 insertions(+), 4 deletions(-) > > > > > > > > > > > > > > diff --git a/libstdc++-v3/libsupc++/hash_bytes.cc > > > > > > > b/libstdc++-v3/libsupc++/hash_bytes.cc > > > > > > > index 3665375096a..294a7323dd0 100644 > > > > > > > --- a/libstdc++-v3/libsupc++/hash_bytes.cc > > > > > > > +++ b/libstdc++-v3/libsupc++/hash_bytes.cc > > > > > > > @@ -50,10 +50,29 @@ namespace > > > > > > >load_bytes(const char* p, int n) > > > > > > >{ > > > > > > > std::size_t result = 0; > > > > > > > ---n; > > > > > > > -do > > > > > > > - result = (result << 8) + static_cast(p[n]); > > > > > > > -while (--n >= 0); > > > > > > > > > > > > Don't we still need to loop, for the case where n >= 8? Otherwise we > > > > > > only hash the first 8 bytes. > > > > > > > > > > Ah, but it's only ever called with load_bytes(end, len & 0x7) > > > > > > > > The compiler should do such transforms - you probably want to tell > > > > it that n < 8 though, it likely doesn't (always) know. > > > > > > e.g. like this? > > > > > > if ((n & 7) != n) > > > __builtin_unreachable(); > > > > > > For the microbenchmark that seems to make things consistently worse: > > > https://quick-bench.com/q/2yCEqzFS8R8ueJ0-Gs-sZ6uWWEw > > > > Oh actually in the benchmark I used (!(1 <= n && n < 8)) because 1 <= > > n is always true too. > > > > GCC still wasn't able to unroll the loop, even with a > __builtin_unreachable, but benchmark link you mentioned above uses -O2 > optimization level (not sure if it was intentional). That was intentional, because that's how libsupc++/hash_bytes.cc gets compiled. > > If we'll use -O3 [1], then GCC was able to unroll the loop for > load_bytes_loop_assume version, but at the same time I am not sure all > loop control instructions were elided, I still can see them on Godbolt > version of generated code [2]. Benchmark charts partially confirm that, > because performance of load_bytes_loop and load_bytes_loop_assume are > now quite close (same actually, except case n = 1). I guess it would > make sense, as we execute same amount of instructions. > > In addition, chart for load_bytes_switch look quite jumpy for [1] and > became better for cases n = 1 and n = 2. At this point I am not sure it > is not a code alignment issue and we are not measuring noise. > > [1]: https://quick-bench.com/q/LlcgMVhL61CasZVjCWbHd3uid8w > [2]: https://godbolt.org/z/qPf1n7xWs >
[committed] libstdc++: Fix some Parallel Mode testsuite failures
There are more failures that I haven't found yet, because running make check-parallel seems to take several days (because I'm running with GLIBCXX_TESTSUITE_STDS=98,11,14,17,20,23,26). We can fix the rest later. Pushed to trunk. -- >8 -- Some of these are due to no longer using #pragma GCC system_header in libstdc++ headers, some have been failing for longer and weren't noticed. libstdc++-v3/ChangeLog: * include/parallel/algobase.h (search): Use sequential algorithm for constant evaluation. * include/parallel/algorithmfwd.h (search): Add _GLIBCXX20_CONSTEXPR. * include/parallel/multiway_merge.h: Remove stray semi-colon. * include/parallel/multiseq_selection.h: Add diagnostic pragmas for -Wlong-long warning. * include/parallel/quicksort.h: Likewise. * include/parallel/random_number.h: Likewise. * include/parallel/settings.h: Likewise. * include/parallel/workstealing.h: Replace ++ and -- on volatile variables. * testsuite/17_intro/names.cc: Skip names defined by . * testsuite/20_util/pair/dangling_ref.cc: Skip test if Parallel Mode is enabled. * testsuite/20_util/tuple/dangling_ref.cc: Likewise. --- libstdc++-v3/include/parallel/algobase.h | 6 ++ libstdc++-v3/include/parallel/algorithmfwd.h | 1 + libstdc++-v3/include/parallel/multiseq_selection.h | 3 +++ libstdc++-v3/include/parallel/multiway_merge.h | 2 +- libstdc++-v3/include/parallel/quicksort.h| 3 +++ libstdc++-v3/include/parallel/random_number.h| 5 + libstdc++-v3/include/parallel/settings.h | 5 + libstdc++-v3/include/parallel/workstealing.h | 4 ++-- libstdc++-v3/testsuite/17_intro/names.cc | 6 -- libstdc++-v3/testsuite/20_util/pair/dangling_ref.cc | 2 +- libstdc++-v3/testsuite/20_util/tuple/dangling_ref.cc | 2 +- 11 files changed, 32 insertions(+), 7 deletions(-) diff --git a/libstdc++-v3/include/parallel/algobase.h b/libstdc++-v3/include/parallel/algobase.h index 67362f4ecaa..b46ed610661 100644 --- a/libstdc++-v3/include/parallel/algobase.h +++ b/libstdc++-v3/include/parallel/algobase.h @@ -515,11 +515,17 @@ namespace __parallel // Public interface template +_GLIBCXX20_CONSTEXPR inline _FIterator1 search(_FIterator1 __begin1, _FIterator1 __end1, _FIterator2 __begin2, _FIterator2 __end2, _BinaryPredicate __pred) { +#if __cplusplus > 201703L + if (std::is_constant_evaluated()) + return _GLIBCXX_STD_A::search(__begin1, __end1, __begin2, __end2, + std::move(__pred)); +#endif return __search_switch(__begin1, __end1, __begin2, __end2, __pred, std::__iterator_category(__begin1), std::__iterator_category(__begin2)); diff --git a/libstdc++-v3/include/parallel/algorithmfwd.h b/libstdc++-v3/include/parallel/algorithmfwd.h index 476072b860a..7c9843ab161 100644 --- a/libstdc++-v3/include/parallel/algorithmfwd.h +++ b/libstdc++-v3/include/parallel/algorithmfwd.h @@ -353,6 +353,7 @@ namespace __parallel __gnu_parallel::sequential_tag); template +_GLIBCXX20_CONSTEXPR _FIter1 search(_FIter1, _FIter1, _FIter2, _FIter2, _BiPredicate); diff --git a/libstdc++-v3/include/parallel/multiseq_selection.h b/libstdc++-v3/include/parallel/multiseq_selection.h index 22bd97e6432..53264fd156b 100644 --- a/libstdc++-v3/include/parallel/multiseq_selection.h +++ b/libstdc++-v3/include/parallel/multiseq_selection.h @@ -189,9 +189,12 @@ namespace __gnu_parallel __r = __rd_log2(__nmax) + 1; +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wlong-long" // LL literal // Pad all lists to this length, at least as long as any ns[__i], // equality iff __nmax = 2^__k - 1. __l = (1ULL << __r) - 1; +#pragma GCC diagnostic pop for (_SeqNumber __i = 0; __i < __m; __i++) { diff --git a/libstdc++-v3/include/parallel/multiway_merge.h b/libstdc++-v3/include/parallel/multiway_merge.h index e4bd0042282..d894e636a3e 100644 --- a/libstdc++-v3/include/parallel/multiway_merge.h +++ b/libstdc++-v3/include/parallel/multiway_merge.h @@ -2067,6 +2067,6 @@ namespace __gnu_parallel (__seqs_begin, __seqs_end, __target, __length, __comp, exact_tag(__tag.__get_num_threads())); } -}; // namespace __gnu_parallel +} // namespace __gnu_parallel #endif /* _GLIBCXX_PARALLEL_MULTIWAY_MERGE_H */ diff --git a/libstdc++-v3/include/parallel/quicksort.h b/libstdc++-v3/include/parallel/quicksort.h index a678b6d4690..c728cd91c24 100644 --- a/libstdc++-v3/include/parallel/quicksort.h +++ b/libstdc++-v3/include/parallel/quicksort.h @@ -66,12 +66,15 @@ namespace __gnu_parallel _ValueType* __samples = static_cast<_ValueType*> (::operator new(__num_samples * sizeof(_ValueType))); +#pragma GCC
Re: [PATCH] aarch64: Set Armv9-A generic L1 cache line size to 64 bytes
Hi Richard, > On 1 Oct 2024, at 13:35, Richard Sandiford wrote: > > External email: Use caution opening links or attachments > > > Kyrylo Tkachov writes: >> Hi all, >> I'd like to use a value of 64 bytes for the L1 cache size for Armv9-A >> generic tuning. >> As described in g:9a99559a478111f7fbeec29bd78344df7651c707 this value is used >> to set the std::hardware_destructive_interference_size value which we want to >> be not overly large when running concurrent applications on large core-count >> systems. >> >> The generic value for Armv8-A systems and the port baseline is 256 bytes >> because that's what the A64FX CPU has, as set de-facto in >> aarch64_override_options_internal. >> >> But for Armv9-A CPUs as far as I know there isn't anything larger >> than 64 bytes, so we should be able to use the smaller value here and reduce >> the size of concurrent structs that use >> std::hardware_destructive_interference_size to pad their fields. >> >> Bootstrapped and tested on aarch64-none-linux-gnu. >> >> WDYT? > > I suppose doing this for a form of generic tuning goes somewhat against: > > /* Set up parameters to be used in prefetching algorithm. Do not > override the defaults unless we are tuning for a core we have > researched values for. */ > Yeah, I think the intent of that comment is for heuristics that guide the SW prefetch emission. I think it was added before the introduction of std::hardware_destructive_interference_size and its dependence on the L1 cache line size. > But I agree it doesn't make conceptual sense to constrain a known-to-be > Armv9-A core based on values that are only needed for Armv8-A cores. > So no objection from me FWIW. > Thanks, I’ll commit this version but we may want to refactor things a bit in that area in the future. It’s also something to consider when adding new Neoverse core support. > I think we would need to do something else if there are ever Armv9-A > cores with different L1 cache line sizes though. E.g. if a new Armv9-A > core has a 128-byte cache line, we would probably want to set the range > to [64, 128] rather than the patch's [64, 64], and rather than the > current [64, 256]. From what I can tell it would be catastrophically bad to have a smaller value than the real hardware because of introduction of false sharing. Having a larger value than the HW is not as bad, but suboptimal. So the compiler would have to pick the largest of the possible values IMO. Maybe there’s some clever scheme we could invent in aarch64_override_options_internal to go through all the CPUs of the specified architecture and above and take the maximum of their sizes automatically. These values are known to the compiler statically after all. Thanks, Kyrill > > Thanks, > Richard > > >> Thanks, >> Kyrill >> >> >>* config/aarch64/tuning_models/generic_armv9_a.h >>(generic_armv9a_prefetch_tune): Define. >>(generic_armv9_a_tunings): Use the above. >> >> From 93aa4ec4d972dfff02ccd6751af160ed243aa750 Mon Sep 17 00:00:00 2001 >> From: Kyrylo Tkachov >> Date: Fri, 20 Sep 2024 05:11:39 -0700 >> Subject: [PATCH] aarch64: Set Armv9-A generic L1 cache line size to 64 bytes >> >> I'd like to use a value of 64 bytes for the L1 cache size for Armv9-A >> generic tuning. >> As described in g:9a99559a478111f7fbeec29bd78344df7651c707 this value is used >> to set the std::hardware_destructive_interference_size value which we want to >> be not overly large when running concurrent applications on large core-count >> systems. >> >> The generic value for Armv8-A systems and the port baseline is 256 bytes >> because that's what the A64FX CPU has, as set de-facto in >> aarch64_override_options_internal. >> >> But for Armv9-A CPUs as far as I know there isn't anything larger >> than 64 bytes, so we should be able to use the smaller value here and reduce >> the size of concurrent structs that use >> std::hardware_destructive_interference_size to pad their fields. >> >> Bootstrapped and tested on aarch64-none-linux-gnu. >> >> * config/aarch64/tuning_models/generic_armv9_a.h >> (generic_armv9a_prefetch_tune): Define. >> (generic_armv9_a_tunings): Use the above. >> --- >> gcc/config/aarch64/tuning_models/generic_armv9_a.h | 14 +- >> 1 file changed, 13 insertions(+), 1 deletion(-) >> >> diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h >> b/gcc/config/aarch64/tuning_models/generic_armv9_a.h >> index 85ed40f..76b3e4c9cf7 100644 >> --- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h >> +++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h >> @@ -207,6 +207,18 @@ static const struct cpu_vector_cost >> generic_armv9_a_vector_cost = >> &generic_armv9_a_vec_issue_info /* issue_info */ >> }; >> >> +/* Generic prefetch settings (which disable prefetch). */ >> +static const cpu_prefetch_tune generic_armv9a_prefetch_tune = >> +{ >> + 0, /* num_slots */ >> + -1,
Re: [PATCH 2/3] cfgexpand: Handle scope conflicts better [PR111422]
On Fri, Oct 4, 2024, 12:07 AM Richard Biener wrote: > On Thu, Oct 3, 2024 at 6:09 PM Andrew Pinski > wrote: > > > > After fixing loop-im to do the correct overflow rewriting > > for pointer types too. We end up with code like: > > ``` > > _9 = (unsigned long) &g; > > _84 = _9 + 18446744073709551615; > > _11 = _42 + _84; > > _44 = (signed char *) _11; > > ... > > *_44 = 10; > > g ={v} {CLOBBER(eos)}; > > ... > > n[0] = &f; > > *_44 = 8; > > g ={v} {CLOBBER(eos)}; > > ``` > > Which was not being recongized by the scope conflicts code. > > This was because it only handled one level walk backs rather than > multiple ones. > > This fixes it by using a work_list to avoid huge recursion and a visited > bitmape to avoid > > going into an infinite loops when dealing with loops. > > Ick. This is now possibly an unbound walk from every use (even duplicate > use!). > Micro-optimizing would be restricting the INTEGRAL_TYPE_P types to ones > matching pointer size. Another micro-optimization would be to track/cache > whether a SSA def is based on a pointer, more optimizing to cache what > pointer(s!) it is based on. > > There's testcases in bugzilla somewhere hard on compile-time in this code > and I can imagine a trivial degenerate one to trigger the issue. > I was thinking about that too. Adding a cache should easy. Especially one that lives over the whole walk of the basic blocks. And yes stopping at integer sizes which is less than a pointer size seems also a reasonable idea. Note I have a patch on top of this that were vector types and constructs are handled too. Will work on this tomorrow. Thanks, Andrew > Richard. > > > Bootstrapped and tested on x86_64-linux-gnu. > > > > PR tree-optimization/111422 > > > > gcc/ChangeLog: > > > > * cfgexpand.cc (add_scope_conflicts_2): Rewrite to be a full walk > > of all operands and their uses. > > > > Signed-off-by: Andrew Pinski > > --- > > gcc/cfgexpand.cc | 46 +++--- > > 1 file changed, 27 insertions(+), 19 deletions(-) > > > > diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc > > index 6c1096363af..2e653d7207c 100644 > > --- a/gcc/cfgexpand.cc > > +++ b/gcc/cfgexpand.cc > > @@ -573,32 +573,40 @@ visit_conflict (gimple *, tree op, tree, void > *data) > > > > /* Helper function for add_scope_conflicts_1. For USE on > > a stmt, if it is a SSA_NAME and in its SSA_NAME_DEF_STMT is known to > be > > - based on some ADDR_EXPR, invoke VISIT on that ADDR_EXPR. */ > > + based on some ADDR_EXPR, invoke VISIT on that ADDR_EXPR. Also walk > > + the assignments backwards as they might be based on an ADDR_EXPR. */ > > > > -static inline void > > +static void > > add_scope_conflicts_2 (tree use, bitmap work, > >walk_stmt_load_store_addr_fn visit) > > { > > - if (TREE_CODE (use) == SSA_NAME > > - && (POINTER_TYPE_P (TREE_TYPE (use)) > > - || INTEGRAL_TYPE_P (TREE_TYPE (use > > + auto_vec work_list; > > + auto_bitmap visited_ssa_names; > > + work_list.safe_push (use); > > + > > + while (!work_list.is_empty()) > > { > > - gimple *g = SSA_NAME_DEF_STMT (use); > > - if (gassign *a = dyn_cast (g)) > > + use = work_list.pop(); > > + if (!use) > > + continue; > > + if (TREE_CODE (use) == ADDR_EXPR) > > + visit (nullptr, TREE_OPERAND (use, 0), use, work); > > + else if (TREE_CODE (use) == SSA_NAME > > + && (POINTER_TYPE_P (TREE_TYPE (use)) > > + || INTEGRAL_TYPE_P (TREE_TYPE (use > > { > > - if (tree op = gimple_assign_rhs1 (a)) > > - if (TREE_CODE (op) == ADDR_EXPR) > > - visit (a, TREE_OPERAND (op, 0), op, work); > > + gimple *g = SSA_NAME_DEF_STMT (use); > > + if (!bitmap_set_bit (visited_ssa_names, SSA_NAME_VERSION(use))) > > + continue; > > + if (gassign *a = dyn_cast (g)) > > + { > > + for (unsigned i = 1; i < gimple_num_ops (g); i++) > > + work_list.safe_push (gimple_op (a, i)); > > + } > > + else if (gphi *p = dyn_cast (g)) > > + for (unsigned i = 0; i < gimple_phi_num_args (p); ++i) > > + work_list.safe_push (gimple_phi_arg_def (p, i)); > > } > > - else if (gphi *p = dyn_cast (g)) > > - for (unsigned i = 0; i < gimple_phi_num_args (p); ++i) > > - if (TREE_CODE (use = gimple_phi_arg_def (p, i)) == SSA_NAME) > > - if (gassign *a = dyn_cast (SSA_NAME_DEF_STMT > (use))) > > - { > > - if (tree op = gimple_assign_rhs1 (a)) > > - if (TREE_CODE (op) == ADDR_EXPR) > > - visit (a, TREE_OPERAND (op, 0), op, work); > > - } > > } > > } > > > > -- > > 2.34.1 > > >
RE: [PATCH] middle-end: reorder masking priority of math functions
Hi Victor, > -Original Message- > From: Victor Do Nascimento > Sent: Wednesday, October 2, 2024 5:26 PM > To: gcc-patches@gcc.gnu.org > Cc: Tamar Christina ; richard.guent...@gmail.com; > Victor Do Nascimento > Subject: [PATCH] middle-end: reorder masking priority of math functions > > Given the categorization of math built-in functions as `ECF_CONST', > when if-converting their uses, their calls are not masked and are thus > called with an all-true predicate. > > This, however, is not appropriate where built-ins have library > equivalents, wherein they may exhibit highly architecture-specific > behaviors. For example, vectorized implementations may delegate the > computation of values outside a certain acceptable numerical range to > special (non-vectorized) routines which considerably slow down > computation. > > As numerical simulation programs often do bounds check on input values > prior to math calls, conditionally assigning default output values for > out-of-bounds input and skipping the math call altogether, these > fallback implementations should seldom be called in the execution of > vectorized code. If, however, we don't apply any masking to these > math functions, we end up effectively executing both if and else > branches for these values, leading to considerable performance > degradation on scientific workloads. > > We therefore invert the order of handling of math function calls in > `if_convertible_stmt_p' to prioritize the handling of their > library-provided implementations over the equivalent internal function. I think this makes sense to me from a technical standpoint and from an SVE one. Though I think the original order may have been there because of the assumption that on some uarches unpredicated implementations are faster than predicated ones. So there may be some concerns about this order being slower for some. I'll leave it up to Richi since e.g. I don't know the perf characteristics of the x86 variants here, but if there is a concern you could use the conditional_operation_is_expensive target hook to decide on the preferred order. But other than that the change itself looks good to be but you still need approval. Cheers, Tamar > > Regression tested on aarch64-none-linux-gnu & x86_64-linux-gnu w/ no > new regressions. > > gcc/ChangeLog: > > * tree-if-conv.cc (if_convertible_stmt_p): Check for explicit > function declaration before IFN fallback. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-fncall-mask-math.c: New. > --- > .../gcc.dg/vect/vect-fncall-mask-math.c | 33 +++ > gcc/tree-if-conv.cc | 18 +- > 2 files changed, 42 insertions(+), 9 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c > b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c > new file mode 100644 > index 000..15e22da2807 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c > @@ -0,0 +1,33 @@ > +/* Test the correct application of masking to autovectorized math function > calls. > + Test is currently set to xfail pending the release of the relevant lmvec > + support. */ > +/* { dg-do compile { target { aarch64*-*-* } } } */ > +/* { dg-additional-options "-march=armv8.2-a+sve -fdump-tree-ifcvt-raw > -Ofast" > { target { aarch64*-*-* } } } */ > + > +#include > + > +const int N = 20; > +const float lim = 101.0; > +const float cst = -1.0; > +float tot = 0.0; > + > +float b[20]; > +float a[20] = { [0 ... 9] = 1.7014118e39, /* If branch. */ > + [10 ... 19] = 100.0 };/* Else branch. */ > + > +int main (void) > +{ > + #pragma omp simd > + for (int i = 0; i < N; i += 1) > +{ > + if (a[i] > lim) > + b[i] = cst; > + else > + b[i] = expf (a[i]); > + tot += b[i]; > +} > + return (0); > +} > + > +/* { dg-final { scan-tree-dump-not { gimple_call } ifcvt { > xfail { > aarch64*-*-* } } } } */ > +/* { dg-final { scan-tree-dump { gimple_call <.MASK_CALL, _2, expf, _1, > _30>} ifcvt > { xfail { aarch64*-*-* } } } } */ > diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc > index 3b04d1e8d34..90c754a4814 100644 > --- a/gcc/tree-if-conv.cc > +++ b/gcc/tree-if-conv.cc > @@ -1133,15 +1133,6 @@ if_convertible_stmt_p (gimple *stmt, > vec refs) > > case GIMPLE_CALL: >{ > - /* There are some IFN_s that are used to replace builtins but have the > -same semantics. Even if MASK_CALL cannot handle them vectorable_call > -will insert the proper selection, so do not block conversion. */ > - int flags = gimple_call_flags (stmt); > - if ((flags & ECF_CONST) > - && !(flags & ECF_LOOPING_CONST_OR_PURE) > - && gimple_call_combined_fn (stmt) != CFN_LAST) > - return true; > - > tree fndecl = gimple_call_fndecl (stmt); > if (fndecl) > { > @@ -1160,6 +1151,15 @@ if_conv
Re: [PATCH] libstdc++: Test 17_intro/names.cc with -D_FORTIFY_SOURCE=2 [PR116210]
On Fri, 4 Oct 2024 at 13:28, Jakub Jelinek wrote: > > On Fri, Oct 04, 2024 at 12:52:11PM +0100, Jonathan Wakely wrote: > > This doesn't really belong in our testsuite, because the sole purpose of > > the new test is to find bugs in the Glibc wrappers (like the one linked > > below). But maybe it's a kindness to do it in our testsuite, because we > > already have this test in place, and one Glibc bug was already found > > thanks to Sam running the existing test with _FORTIFY_SOURCE defined. > > > > Should we do this? > > I think so. While those bugs are glibc bugs, libstdc++ uses libc headers > and so if they have namespace cleanness issues, so does libstdc++. Yeah, we have lots of #undef in that test to deal with libc headers that we can't change, but for Glibc we know we can fix problems much more easily than for e.g. proprietary UNIX headers. > > > Add a new testcase that repeats 17_intro/names.cc but with > > _FORTIFY_SOURCE defined, to find problems in Glibc fortify wrappers like > > https://sourceware.org/bugzilla/show_bug.cgi?id=32052 (which is fixed > > now). > > > > libstdc++-v3/ChangeLog: > > > > PR libstdc++/116210 > > * testsuite/17_intro/names.cc (sz): Undef for versions of Glibc > > that use it in the fortify wrappers. > > * testsuite/17_intro/names_fortify.cc: New test. > > Jakub >
[Patch] OpenMP: Allocate directive for static vars, clean up
'omp allocate' permits to use a different (specified) allocator and alignment for both stack/automatic and static/saved variables; the latter takes only predefined allocators. Currently, only C and Fortran are support for stack/automatic variables; static variables are rejected before the attached patch. (For them, only predefined allocators are permitted.) * * * I happened to look at the 'allocate' directive recently and, doing so, I stumbled over a couple of issues, which the attached patch addresses (missing diagnostics for corner cases, not updated checks, unhelpful documentation ['allocate' *clause*], ...). Doing so, I wondered whether: Shouldn't we just accept 'omp allocate' for static variables by just honoring the aligning and ignoring the actually requested allocator? - First, we do already the same for actual allocations as not all traits are supported. And for the host this seems to be the most sensible to do in any case. [For some use cases, pointers + allocation in the constructor would be better, but in general, not adding an indirection seems to be better and has fewer corner-case usability issue.] I guess we later want to honor the requested memory for nvptx and/or gcn; at least Nvidia GPUs could make use for constant memory (having advantages for reading the same memory by many threads/broadcasting it). I guess OpenACC 2.7's 'readonly' modifier serves a similar purpose. For now we don't, but the attribute is passed on to the backends, which could make use of them, if desired. ('groupprivate' directive vs. cgroup/thread allocators are similar device-only features.) As mentioned, this patch also fixes a few other issues here and there, see commit log and source code for details. Code comments? Suggestions or remarks? - Before I apply this patch? Tobias PS: I am aware that C++ support is lacking. There is a pending patch that needs to be updated for this patch, probably some bitrotting, and in particular for the review comments, cf. https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633782.html and https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639929.html OpenMP: Allocate directive for static vars, clean up For the 'allocate' directive, remove the sorry for static variables and just keep using normal memory, but honor the requested alignment and set a DECL_ATTRIBUTE in case a target may want to make use of this later on. The documentation is updated accordingly. The C diagnostic to check for predefined allocators in this case failed to accept GCC's ompx_gnu_... allocator, now fixed. (Fortran was already okay; but both now use new common #defined value for checking.) And while Fortran common block variables are still rejected, the check has been improved as before the sorry diagnostic did not work for common blocks in modules. Finally, for 'allocate' clause on the target/task/taskloop directives, there is now a warning for omp_thread_mem_alloc (i.e. predefined allocator with access = thread), which is undefined behavior according to the OpenMP specification. And, last, testing showed that var decl + static_assert sets TREE_USED but does not produce a statement list in C, which did run into an assert in gimplify. This special case is now also handled. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_allocate): Set alignment for alignof; accept static variables and fix predef allocator check. gcc/fortran/ChangeLog: * openmp.cc (is_predefined_allocator): Use gomp-constants.h consts. * trans-common.cc (translate_common): Reject OpenMP allocate directives. * trans-decl.cc (gfc_finish_var_decl): Handle allocate directive for static variables. (gfc_trans_deferred_vars): Update for the latter. gcc/ChangeLog: * gimplify.cc (gimplify_bind_expr): Fix corner case for OpenMP allocate directive. (gimplify_scan_omp_clauses): Warn if omp_thread_mem_alloc is used as allocator with the target/task/taskloop directive. include/ChangeLog: * gomp-constants.h (GOMP_OMP_PREDEF_ALLOC_MAX, GOMP_OMPX_PREDEF_ALLOC_MIN, GOMP_OMPX_PREDEF_ALLOC_MAX, GOMP_OMP_PREDEF_ALLOC_THREADS): New defines. libgomp/ChangeLog: * allocator.c: Add static asserts for news GOMP_OMP{,X}_PREDEF_ALLOC_{MIN,MAX} range values. * libgomp.texi (OpenMP Impl. Status): Allocate directive for static vars is now supported. Refer to PR for allocate clause. (Memory allocation): Update for static vars; minor word tweaking. gcc/testsuite/ChangeLog: * c-c++-common/gomp/allocate-9.c: Update for removed sorry. * gfortran.dg/gomp/allocate-15.f90: Likewise. * gfortran.dg/gomp/allocate-pinned-1.f90: Likewise. * gfortran.dg/gomp/allocate-4.f90: Likewise; add dg-error for previously missing diagnostic. * c-c++-common/gomp/allocate-18.c: New test. * c-c++-common/gomp/allocate-19.c: New test. * gfortran.dg/gomp/allocate-clause.f90: New test. * gfortran.dg/gomp/allocate-static-2.f90: New test. * gfortran.dg/gomp/allocate-static.f90: New test. gcc/c/c-parser.cc | 29
Re: [PATCH 1/2] c++: add -Wdeprecated-literal-operator [CWG2521]
On 10/4/24 8:22 AM, Jakub Jelinek wrote: On Fri, Oct 04, 2024 at 12:19:03PM +0200, Jakub Jelinek wrote: Though, maybe the tests should have both the deprecated syntax and the non-deprecated one... Here is a variant of the patch which does that. Tested on x86_64-linux and i686-linux, ok for trunk? OK. 2024-10-04 Jakub Jelinek * g++.dg/cpp26/unevalstr1.C: Revert the 2024-10-03 changes, instead expect extra warnings. Add another set of tests without space between " and _. * g++.dg/cpp26/unevalstr2.C: Expect extra warnings for C++23. Add another set of tests without space between " and _. --- gcc/testsuite/g++.dg/cpp26/unevalstr1.C.jj 2024-10-04 12:28:08.820899177 +0200 +++ gcc/testsuite/g++.dg/cpp26/unevalstr1.C 2024-10-04 14:15:35.563531334 +0200 @@ -83,21 +83,57 @@ extern "\o{0103}" { int f14 (); } // { d [[nodiscard ("\x{20}")]] int h19 ();// { dg-error "numeric escape sequence in unevaluated string" } [[nodiscard ("\h")]] int h20 ();// { dg-error "unknown escape sequence" } -float operator ""_my0 (const char *); -float operator "" ""_my1 (const char *); -float operator L""_my2 (const char *); // { dg-error "invalid encoding prefix in literal operator" } -float operator u""_my3 (const char *); // { dg-error "invalid encoding prefix in literal operator" } -float operator U""_my4 (const char *); // { dg-error "invalid encoding prefix in literal operator" } -float operator u8""_my5 (const char *); // { dg-error "invalid encoding prefix in literal operator" } -float operator L"" ""_my6 (const char *); // { dg-error "invalid encoding prefix in literal operator" } -float operator u"" ""_my7 (const char *); // { dg-error "invalid encoding prefix in literal operator" } -float operator U"" ""_my8 (const char *); // { dg-error "invalid encoding prefix in literal operator" } -float operator u8"" ""_my9 (const char *); // { dg-error "invalid encoding prefix in literal operator" } -float operator "" L""_my10 (const char *); // { dg-error "invalid encoding prefix in literal operator" } -float operator "" u""_my11 (const char *); // { dg-error "invalid encoding prefix in literal operator" } -float operator "" U""_my12 (const char *); // { dg-error "invalid encoding prefix in literal operator" } -float operator "" u8""_my13 (const char *);// { dg-error "invalid encoding prefix in literal operator" } -float operator "\0"_my14 (const char *); // { dg-error "expected empty string after 'operator' keyword" } -float operator "\x00"_my15 (const char *); // { dg-error "expected empty string after 'operator' keyword" } -float operator "\h"_my16 (const char *); // { dg-error "expected empty string after 'operator' keyword" } +float operator "" _my0 (const char *); +float operator "" "" _my1 (const char *); +float operator L"" _my2 (const char *); // { dg-error "invalid encoding prefix in literal operator" } +float operator u"" _my3 (const char *); // { dg-error "invalid encoding prefix in literal operator" } +float operator U"" _my4 (const char *); // { dg-error "invalid encoding prefix in literal operator" } +float operator u8"" _my5 (const char *); // { dg-error "invalid encoding prefix in literal operator" } +float operator L"" "" _my6 (const char *); // { dg-error "invalid encoding prefix in literal operator" } +float operator u"" "" _my7 (const char *); // { dg-error "invalid encoding prefix in literal operator" } +float operator U"" "" _my8 (const char *); // { dg-error "invalid encoding prefix in literal operator" } +float operator u8"" "" _my9 (const char *);// { dg-error "invalid encoding prefix in literal operator" } +float operator "" L"" _my10 (const char *);// { dg-error "invalid encoding prefix in literal operator" } +float operator "" u"" _my11 (const char *);// { dg-error "invalid encoding prefix in literal operator" } +float operator "" U"" _my12 (const char *);// { dg-error "invalid encoding prefix in literal operator" } +float operator "" u8"" _my13 (const char *); // { dg-error "invalid encoding prefix in literal operator" } +float operator "\0" _my14 (const char *);// { dg-error "expected empty string after 'operator' keyword" } +float operator "\x00" _my15 (const char *); // { dg-error "expected empty string after 'operator' keyword" } +float operator "\h" _my16 (const char *);// { dg-error "expected empty string after 'operator' keyword" } + // { dg-error "unknown escape sequence" "" { target *-*-* } .-1 } +// { dg-warning "space between quotes and suffix is deprecated" "" { target *-*-* } .-18 } +// { dg-warning "space between quotes and suffix is deprecated" "" { target *-*-* } .-18 } +// { dg-warning "space between quotes and suffix is deprecated" "" { target *-*-* } .-18 } +// { dg-warning "space between quotes
Re: [PATCH] libstdc++: Unroll loop in load_bytes function
On Fri, Oct 04, 2024 at 10:20:27AM +0100, Jonathan Wakely wrote: > On Fri, 4 Oct 2024 at 10:19, Jonathan Wakely wrote: > > > > On Fri, 4 Oct 2024 at 07:53, Richard Biener > > wrote: > > > > > > On Wed, Oct 2, 2024 at 8:26 PM Jonathan Wakely wrote: > > > > > > > > On Wed, 2 Oct 2024 at 19:16, Jonathan Wakely wrote: > > > > > > > > > > On Wed, 2 Oct 2024 at 19:15, Dmitry Ilvokhin > > > > > wrote: > > > > > > > > > > > > Instead of looping over every byte of the tail, unroll loop manually > > > > > > using switch statement, then compilers (at least GCC and Clang) will > > > > > > generate a jump table [1], which is faster on a microbenchmark [2]. > > > > > > > > > > > > [1]: https://godbolt.org/z/aE8Mq3j5G > > > > > > [2]: https://quick-bench.com/q/ylYLW2R22AZKRvameYYtbYxag24 > > > > > > > > > > > > libstdc++-v3/ChangeLog: > > > > > > > > > > > > * libstdc++-v3/libsupc++/hash_bytes.cc (load_bytes): unroll > > > > > > loop using switch statement. > > > > > > > > > > > > Signed-off-by: Dmitry Ilvokhin > > > > > > --- > > > > > > libstdc++-v3/libsupc++/hash_bytes.cc | 27 > > > > > > +++ > > > > > > 1 file changed, 23 insertions(+), 4 deletions(-) > > > > > > > > > > > > diff --git a/libstdc++-v3/libsupc++/hash_bytes.cc > > > > > > b/libstdc++-v3/libsupc++/hash_bytes.cc > > > > > > index 3665375096a..294a7323dd0 100644 > > > > > > --- a/libstdc++-v3/libsupc++/hash_bytes.cc > > > > > > +++ b/libstdc++-v3/libsupc++/hash_bytes.cc > > > > > > @@ -50,10 +50,29 @@ namespace > > > > > >load_bytes(const char* p, int n) > > > > > >{ > > > > > > std::size_t result = 0; > > > > > > ---n; > > > > > > -do > > > > > > - result = (result << 8) + static_cast(p[n]); > > > > > > -while (--n >= 0); > > > > > > > > > > Don't we still need to loop, for the case where n >= 8? Otherwise we > > > > > only hash the first 8 bytes. > > > > > > > > Ah, but it's only ever called with load_bytes(end, len & 0x7) > > > > > > The compiler should do such transforms - you probably want to tell > > > it that n < 8 though, it likely doesn't (always) know. > > > > e.g. like this? > > > > if ((n & 7) != n) > > __builtin_unreachable(); > > > > For the microbenchmark that seems to make things consistently worse: > > https://quick-bench.com/q/2yCEqzFS8R8ueJ0-Gs-sZ6uWWEw > > Oh actually in the benchmark I used (!(1 <= n && n < 8)) because 1 <= > n is always true too. > GCC still wasn't able to unroll the loop, even with a __builtin_unreachable, but benchmark link you mentioned above uses -O2 optimization level (not sure if it was intentional). If we'll use -O3 [1], then GCC was able to unroll the loop for load_bytes_loop_assume version, but at the same time I am not sure all loop control instructions were elided, I still can see them on Godbolt version of generated code [2]. Benchmark charts partially confirm that, because performance of load_bytes_loop and load_bytes_loop_assume are now quite close (same actually, except case n = 1). I guess it would make sense, as we execute same amount of instructions. In addition, chart for load_bytes_switch look quite jumpy for [1] and became better for cases n = 1 and n = 2. At this point I am not sure it is not a code alignment issue and we are not measuring noise. [1]: https://quick-bench.com/q/LlcgMVhL61CasZVjCWbHd3uid8w [2]: https://godbolt.org/z/qPf1n7xWs
Re: [patch,testsuite] Some float64 and float32x test require double64plus.
On Oct 4, 2024, at 9:40 AM, Georg-Johann Lay wrote: > > Some of the float64 and float32x test cases are using double built-ins > and hence require double64plus resp. double_float32xplus, i.e. double > is at least as good as float32x. > > This patch adds according dg-require-effective-target filters. > (But only for test cases where I can verify that they are working > with double64+ but are failing with double32.) > > Ok for trunk? Ok. If you are that domain expert, these sorts of changes are more obvious to you than to me. :-)
[PATCH v3 0/5] openmp: Add support for iterators in OpenMP mapping clauses
This is a further improved patch series to that posted at: https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662138.html The main change is that the expansion of the iterators is pushed back further to the omp-lowering stage. This is because the recently committed deep-mapping support (and features such as strided array updates in the OpenMP development branch) do their work in the omplower stage, but the iterators need to be expanded after any changes to the clauses and their decls/sizes have occurred. This patch set does not support deep-mapping yet - it just emits a sorry when this happens. The iterator expansion now does not happen all at once. A new loop is generated the first time a clause with a new iterator is found, and is reused if the same iterator is used in another clause. Assigning the clause decl and size in the iterator loop are now done by calling separate functions from lower_target, which also return the new hostaddr/size expression from the iterator loop that should be passed to libgomp. This fits in better with the existing code structure. A final function is called to finalise all the loops. As multiple sets of loops can be 'in-flight' at once, a new structure and a hash map are used to keep track of their states. Instead of making the OMP_CLAUSE_DECL into a tree list with the iterator in TREE_PURPOSE and original decl in TREE_VALUE, I have stored the iterator in a third argument in the clause tree node addressed by OMP_CLAUSE_ITERATOR instead. In this way, changes do not have to be made in the intervening code-path to extract the original OMP_CLAUSE_DECL (which is messy and error-prone), allowing code unrelated to iterators to go unmodified. Nearly all special-cases for iterators have now been removed. I have also fixed some issues detected by Linaro CI - some format specifier issues, and some tests that expect a non-unified target address space failing. Gomp GCC tests and libgomp tests run on x86_64 host with Nvidia offloading. Okay for trunk? Kwok
[PATCH v3 1/5] openmp: Refactor handling of iterators
This patch factors out the code to calculate the number of iterations required and to generate the iteration loop into separate functions from gimplify_omp_depend for reuse later. I have also replaced the 'TREE_CODE (*tp) == TREE_LIST && ...' checks used for detecting an iterator clause with a macro OMP_ITERATOR_DECL_P, as it needs to be done frequently.From 34bf780b1e0395028ecdacfa1385238a8da13be6 Mon Sep 17 00:00:00 2001 From: Kwok Cheung Yeung Date: Fri, 4 Oct 2024 15:15:42 +0100 Subject: [PATCH 1/5] openmp: Refactor handling of iterators Move code to calculate the iteration size and to generate the iterator expansion loop into separate functions. Use OMP_ITERATOR_DECL_P to check for iterators in clause declarations. 2024-10-04 Kwok Cheung Yeung gcc/c-family/ * c-omp.cc (c_finish_omp_depobj): Use OMP_ITERATOR_DECL_P. gcc/c/ * c-typeck.cc (handle_omp_array_sections): Use OMP_ITERATOR_DECL_P. (c_finish_omp_clauses): Likewise. gcc/cp/ * pt.cc (tsubst_omp_clause_decl): Use OMP_ITERATOR_DECL_P. * semantics.cc (handle_omp_array_sections): Likewise. (finish_omp_clauses): Likewise. gcc/ * gimplify.cc (gimplify_omp_affinity): Use OMP_ITERATOR_DECL_P. (compute_iterator_count): New. (build_iterator_loop): New. (gimplify_omp_depend): Use OMP_ITERATOR_DECL_P, compute_iterator_count and build_iterator_loop. * tree-inline.cc (copy_tree_body_r): Use OMP_ITERATOR_DECL_P. * tree-pretty-print.cc (dump_omp_clause): Likewise. * tree.h (OMP_ITERATOR_DECL_P): New macro. --- gcc/c-family/c-omp.cc| 4 +- gcc/c/c-typeck.cc| 13 +- gcc/cp/pt.cc | 4 +- gcc/cp/semantics.cc | 8 +- gcc/gimplify.cc | 326 +++ gcc/tree-inline.cc | 5 +- gcc/tree-pretty-print.cc | 8 +- gcc/tree.h | 6 + 8 files changed, 175 insertions(+), 199 deletions(-) diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc index 620a3c1353a..24c8a801255 100644 --- a/gcc/c-family/c-omp.cc +++ b/gcc/c-family/c-omp.cc @@ -744,9 +744,7 @@ c_finish_omp_depobj (location_t loc, tree depobj, kind = OMP_CLAUSE_DEPEND_KIND (clause); t = OMP_CLAUSE_DECL (clause); gcc_assert (t); - if (TREE_CODE (t) == TREE_LIST - && TREE_PURPOSE (t) - && TREE_CODE (TREE_PURPOSE (t)) == TREE_VEC) + if (OMP_ITERATOR_DECL_P (t)) { error_at (OMP_CLAUSE_LOCATION (clause), "% modifier may not be specified on " diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc index ba6d96d26b2..30a03f071d8 100644 --- a/gcc/c/c-typeck.cc +++ b/gcc/c/c-typeck.cc @@ -14504,9 +14504,7 @@ handle_omp_array_sections (tree &c, enum c_omp_region_type ort) tree *tp = &OMP_CLAUSE_DECL (c); if ((OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEPEND || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_AFFINITY) - && TREE_CODE (*tp) == TREE_LIST - && TREE_PURPOSE (*tp) - && TREE_CODE (TREE_PURPOSE (*tp)) == TREE_VEC) + && OMP_ITERATOR_DECL_P (*tp)) tp = &TREE_VALUE (*tp); tree first = handle_omp_array_sections_1 (c, *tp, types, maybe_zero_len, first_non_one, @@ -15697,9 +15695,7 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort) case OMP_CLAUSE_DEPEND: case OMP_CLAUSE_AFFINITY: t = OMP_CLAUSE_DECL (c); - if (TREE_CODE (t) == TREE_LIST - && TREE_PURPOSE (t) - && TREE_CODE (TREE_PURPOSE (t)) == TREE_VEC) + if (OMP_ITERATOR_DECL_P (t)) { if (TREE_PURPOSE (t) != last_iterators) last_iterators_remove @@ -15799,10 +15795,7 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort) break; } } - if (TREE_CODE (OMP_CLAUSE_DECL (c)) == TREE_LIST - && TREE_PURPOSE (OMP_CLAUSE_DECL (c)) - && (TREE_CODE (TREE_PURPOSE (OMP_CLAUSE_DECL (c))) - == TREE_VEC)) + if (OMP_ITERATOR_DECL_P (OMP_CLAUSE_DECL (c))) TREE_VALUE (OMP_CLAUSE_DECL (c)) = t; else OMP_CLAUSE_DECL (c) = t; diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc index 43468e5f62e..5a72402ba1f 100644 --- a/gcc/cp/pt.cc +++ b/gcc/cp/pt.cc @@ -17604,9 +17604,7 @@ tsubst_omp_clause_decl (tree decl, tree args, tsubst_flags_t complain, return decl; /* Handle OpenMP iterators. */ - if (TREE_CODE (decl) == TREE_LIST - && TREE_PURPOSE (decl) - && TREE_CODE (TREE_PURPOSE (decl)) == TREE_VEC) + if (OMP_ITERATOR_DECL_P (decl)) { tree ret; if (iterator_cache[0] == TREE_PURPOSE (decl)) diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc index 0cb46c1986c..4f856a9d749 100644 --- a/gcc/cp/semantics.cc +++ b/gcc/cp/
[PATCH v3 2/5] openmp: Add support for iterators in map clauses (C/C++)
This patch modifies the C and C++ parsers to accept an iterator as a map type modifier, storing it in the OMP_CLAUSE_ITERATOR argument of the clause. When finishing clauses, any clauses generated from a clause with iterators also has the iterator applied to them. During gimplification, check_omp_map_iterators is called to check that all iterator variables are referenced at some point with a clause. Gimplification of the clause decl and size are delayed until iterator expansion as they may reference iterator variables. In lower_target, lower_omp_map_iterators is called to construct the expansion loop for iterator clauses. Clauses using the same set of iterators reuse the loop, though with different storage allocated for them. lower_omp_map_iterator_expr is called to add the final expression that is sent as the hostaddr for libgomp to the loop, and a reference to the array generated by the iterator loop is returned to replace the original expression. lower_omp_map_iterator_size works similarly for the clause size. finish_omp_map_iterators is called later to finalise the loop. Libgomp has a new function gomp_merge_iterator_maps which identifies data coming from an iterator, and effectively creates new maps on-the-fly from the iterator info array, inserting them into the list of mappings at the point where iterator data occurred. As there are now multiple maps where one was previously, an entry is only added to the target vars for the first expanded map, otherwise it will get out of sync with the expected layout and the wrong variables will be picked up by the target function.From 50557e513ca534ba32f50d1b056a07a6f671 Mon Sep 17 00:00:00 2001 From: Kwok Cheung Yeung Date: Fri, 4 Oct 2024 15:16:12 +0100 Subject: [PATCH 2/5] openmp: Add support for iterators in map clauses (C/C++) This adds preliminary support for iterators in map clauses within OpenMP 'target' constructs (which includes constructs such as 'target enter data'). Iterators with non-constant loop bounds are not currently supported. 2024-10-04 Kwok Cheung Yeung gcc/c/ * c-parser.cc (c_parser_omp_clause_map): Parse 'iterator' modifier. * c-typeck.cc (c_finish_omp_clauses): Finish iterators. Apply iterators to generated clauses. gcc/cp/ * parser.cc (cp_parser_omp_clause_map): Parse 'iterator' modifier. * semantics.cc (finish_omp_clauses): Finish iterators. Apply iterators to generated clauses. gcc/ * gimplify.cc (compute_iterator_count): Make non-static. Take an iterator instead of a clause for an operand. (build_iterator_loop): Likewise. (gimplify_omp_depend): Pass iterator in call to compute_iterator_count and build_iterator_loop. (find_var_decl): New. (check_omp_map_iterators): New. (gimplify_scan_omp_clauses): Call check_omp_map_iterators on clauses with iterators. (gimplify_adjust_omp_clauses): Skip gimplification of clause decl and size for clauses with iterators. * omp-low.cc (struct iterator_loop_info_t): New type. (iterator_loop_map_t): New type. (lower_omp_map_iterators): New. (lower_omp_map_iterator_expr): New. (lower_omp_map_iterator_size): New. (finish_omp_map_iterators): New. (lower_omp_target): Call lower_omp_map_iterators on clauses with iterators. Call lower_omp_map_iterator_expr before assigning to sender ref. Call lower_omp_map_iterator_size before setting the size. Call finish_omp_map_iterators. Insert statements generated during iterator expansion before the statements for the target clause. * tree-pretty-print.cc (dump_omp_clause): Call dump_omp_iterators for iterators in map clauses. * tree.cc (omp_clause_num_ops): Add operand for OMP_CLAUSE_MAP. (walk_tree_1): Do not walk last operand of OMP_CLAUSE_MAP. * tree.h (OMP_CLAUSE_HAS_ITERATORS): New. (OMP_CLAUSE_ITERATORS: New. gcc/testsuite/ * c-c++-common/gomp/map-6.c (foo): Amend expected error message. * c-c++-common/gomp/target-map-iterators-1.c: New. * c-c++-common/gomp/target-map-iterators-2.c: New. * c-c++-common/gomp/target-map-iterators-3.c: New. libgomp/ * target.c (kind_to_name): New. (gomp_merge_iterator_maps): New. (gomp_map_vars_internal): Call gomp_merge_iterator_maps. Copy address of only the first iteration to target vars. Free allocated variables. * testsuite/libgomp.c-c++-common/target-map-iterators-1.c: New. * testsuite/libgomp.c-c++-common/target-map-iterators-2.c: New. * testsuite/libgomp.c-c++-common/target-map-iterators-3.c: New. --- gcc/c/c-parser.cc | 59 +- gcc/c/c-typeck.cc | 22 ++- gcc/cp/parser.cc | 62 +
Re: [patch,avr] Implement TARGET_FLOATN_MODE
Am 04.10.24 um 16:32 schrieb Jakub Jelinek: On Fri, Oct 04, 2024 at 08:09:48AM -0600, Jeff Law wrote: On 10/4/24 7:46 AM, Georg-Johann Lay wrote: This patch implements TARGET_FLOATN_MODE which maps _Float32[x] to SFmode and _Float64[x] to DFmode. There is currently no library support for extended float types, but these settings are more reasonable for avr (and they make more tests pass). Ok for trunk? Johann -- AVR: Implement TARGET_FLOATN_MODE. gcc/ * config/avr/avr.cc (avr_floatn_mode): New static function. (TARGET_FLOATN_MODE): New define. OK This is certainly incorrect. As specified by e.g. ISO C23 H.2.3 Extended floating types, the requirement on the extended floating types is: "For each of its basic formats, IEC 60559 specifies an extended format whose maximum exponent and precision exceed those of the basic format it is associated with. Extended formats are intended for arithmetic with more precision and exponent range than is available in the basic formats used for the input data." So, while SFmode is a good mode to use for _Float32 and DFmode is a good mode to use for _Float64, SFmode isn't a good mode to use for _Float32x and neither is DFmode a good mode to use for _Float64x. I'd expect you want DFmode for _Float32x and opt_scalar_float_mode () for _Float64x. Jakub Thanks for the clarification. So I guess that hook is not needed at all, and the default implementation is already the best avr can do. Johann
Re: [PATCH 2/3] Release expanded template argument vector
On Thu, 3 Oct 2024, Jason Merrill wrote: > On 10/3/24 12:38 PM, Jason Merrill wrote: > > On 10/2/24 7:50 AM, Richard Biener wrote: > > > This reduces peak memory usage by 20% for a specific testcase. > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > > > > > It's very ugly so I'd appreciate suggestions on how to handle such > > > situations better? > > > > I'm pushing this alternative patch, tested x86_64-pc-linux-gnu. > > OK, apparently that was both too clever and not clever enough. Replacing it > with this one that's much closer to yours. > > Jason > From: Jason Merrill > Date: Thu, 3 Oct 2024 16:31:00 -0400 > Subject: [PATCH] c++: free garbage vec in coerce_template_parms > To: gcc-patches@gcc.gnu.org > > coerce_template_parms can create two different vecs for the inner template > arguments, new_inner_args and (potentially) the result of > expand_template_argument_pack. One or the other, or possibly both, end up > being garbage: in the typical case, the expanded vec is garbage because it's > only used as the source for convert_template_argument. In some dependent > cases, the new vec is garbage because we decide to return the original args > instead. In these cases, ggc_free the garbage vec to reduce the memory > overhead of overload resolution. > > gcc/cp/ChangeLog: > > * pt.cc (coerce_template_parms): Free garbage vecs. > > Co-authored-by: Richard Biener > --- > gcc/cp/pt.cc | 10 +- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc > index 20affcd65a2..4ceae1d38de 100644 > --- a/gcc/cp/pt.cc > +++ b/gcc/cp/pt.cc > @@ -9275,6 +9275,7 @@ coerce_template_parms (tree parms, > { > /* We don't know how many args we have yet, just use the >unconverted (and still packed) ones for now. */ > + ggc_free (new_inner_args); > new_inner_args = orig_inner_args; > arg_idx = nargs; > break; > @@ -9329,7 +9330,8 @@ coerce_template_parms (tree parms, > = make_pack_expansion (conv, complain); > >/* We don't know how many args we have yet, just > - use the unconverted ones for now. */ > + use the unconverted (but unpacked) ones for now. */ > + ggc_free (new_inner_args); I'm a bit worried about these ggc_frees. If an earlier template parameter is a constrained auto NTTP then new_inner_args/new_args could have been captured by the satisfaction cache during coercion for that argument, and so we'd be freeing a vector that's still live? >new_inner_args = inner_args; > arg_idx = nargs; >break; > @@ -9442,6 +9444,12 @@ coerce_template_parms (tree parms, > SET_NON_DEFAULT_TEMPLATE_ARGS_COUNT (new_inner_args, >TREE_VEC_LENGTH (new_inner_args)); > > + /* If we expanded packs in inner_args and aren't returning it now, the > + expanded vec is garbage. */ > + if (inner_args != new_inner_args > + && inner_args != orig_inner_args) > +ggc_free (inner_args); > + >return return_full_args ? new_args : new_inner_args; > } > > -- > 2.46.2 >
Re: [RFC PATCH] ARM: thumb1: fix bad code emitted when HI_REGS involved
Hello, пт, 4 окт. 2024 г. в 16:48, Christophe Lyon : > > Hi! > > > On Mon, 8 Jul 2024 at 10:57, Siarhei Volkau wrote: > > > > ping > > > > чт, 20 июн. 2024 г. в 12:09, Siarhei Volkau : > > > > > > This patch deals with consequences but not the root cause though. > > > > > > There are 5 cases which are subjects to rewrite: > > > case #1: > > > mov ip, r1 > > > add r2, ip > > > # ip is dead here > > > can be rewritten as: > > > adds r2, r1 > > Why replace 'add' with 'adds' ? > > Thanks, > > Christophe > Good catch, actually. Silly answer is: because there's no alternative without {S} for Lo registers in thumb1. Correct me if I'm wrong, I don't think that we have to do something special with CC reg there because conditional execution instructions (thumb1_cbz, cbranchsi4_insn) take care of that. See thumb1_final_prescan_insn. Thanks Siarhei > > > > > > case #2: > > > add ip, r1 > > > mov r1, ip > > > # ip is dead here > > > can be rewritten as: > > > add r1, ip > > > > > > case #3: > > > mov ip, r1 > > > add r2, ip > > > add r3, ip > > > # ip is dead here > > > can be rewritten as: > > > adds r2, r1 > > > adds r3, r1 > > > > > > case #4: > > > mov ip, r1 > > > add ip, r2 > > > mov r1, ip > > > can be rewritten as: > > > adds r1, r2 > > > mov ip, r1 <- might be eliminated too, if ip is dead > > > > > > case #5 (arbitrary): > > > mov r1, ip > > > subs r2, r1, r2 > > > mov ip, r2 > > > # r1 is dead here > > > can be rewritten as: > > > rsbs r1, r2, #0 > > > add ip, r1 > > > movs r2, ip <- might be eliminated, if r2 is dead > > > > > > Speed profit wasn't checked but size changes are the following: > > >libgcc: -132 bytes / -0.25% > > > libc: -1262 bytes / -0.55% > > > libm: -384 bytes / -0.42% > > > libstdc++: -2258 bytes / -0.30% > > > > > > No tests provided because its hard to force GCC to emit HI_REGS > > > in a small and straightforward function. > > > > > > Signed-off-by: Siarhei Volkau > > > --- > > > gcc/config/arm/thumb1.md | 93 +++- > > > 1 file changed, 92 insertions(+), 1 deletion(-) > > > > > > diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md > > > index d7074b43f60..9da4af9eccd 100644 > > > --- a/gcc/config/arm/thumb1.md > > > +++ b/gcc/config/arm/thumb1.md > > > @@ -2055,4 +2055,95 @@ (define_insn "thumb1_stack_protect_test_insn" > > > (set_attr "conds" "clob") > > > (set_attr "type" "multiple")] > > > ) > > > - > > > + > > > +;; bad code emitted when HI_REGS involved in addition > > > +;; subtract also might happen rarely > > > + > > > +;; case #1: > > > +;; mov ip, r1 > > > +;; add r2, ip # ip is dead after that > > > +(define_peephole2 > > > + [(set (match_operand:SI 0 "register_operand" "") > > > + (match_operand:SI 1 "register_operand" "")) > > > + (set (match_operand:SI 2 "register_operand" "") > > > + (plus:SI (match_dup 2) (match_dup 0)))] > > > + "TARGET_THUMB1 > > > +&& peep2_reg_dead_p (2, operands[0]) > > > +&& REGNO_REG_CLASS (REGNO (operands[0])) == HI_REGS" > > > + [(set (match_dup 2) > > > + (plus:SI (match_dup 2) (match_dup 1)))] > > > + "") > > > + > > > +;; case #2: > > > +;; add ip, r1 > > > +;; mov r1, ip # ip is dead after that > > > +(define_peephole2 > > > + [(set (match_operand:SI 0 "register_operand" "") > > > + (plus:SI (match_dup 0) (match_operand:SI 1 "register_operand" > > > ""))) > > > + (set (match_dup 1) (match_dup 0))] > > > + "TARGET_THUMB1 > > > +&& peep2_reg_dead_p (2, operands[0]) > > > +&& REGNO_REG_CLASS (REGNO (operands[0])) == HI_REGS" > > > + [(set (match_dup 1) > > > + (plus:SI (match_dup 1) (match_dup 0)))] > > > + "") > > > + > > > +;; case #3: > > > +;; mov ip, r1 > > > +;; add r2, ip > > > +;; add r3, ip # ip is dead after that > > > +(define_peephole2 > > > + [(set (match_operand:SI 0 "register_operand" "") > > > + (match_operand:SI 1 "register_operand" "")) > > > + (set (match_operand:SI 2 "register_operand" "") > > > + (plus:SI (match_dup 2) (match_dup 0))) > > > + (set (match_operand:SI 3 "register_operand" "") > > > + (plus:SI (match_dup 3) (match_dup 0)))] > > > + "TARGET_THUMB1 > > > +&& peep2_reg_dead_p (3, operands[0]) > > > +&& REGNO_REG_CLASS (REGNO (operands[0])) == HI_REGS" > > > + [(set (match_dup 2) > > > + (plus:SI (match_dup 2) (match_dup 1))) > > > + (set (match_dup 3) > > > + (plus:SI (match_dup 3) (match_dup 1)))] > > > + "") > > > + > > > +;; case #4: > > > +;; mov ip, r1 > > > +;; add ip, r2 > > > +;; mov r1, ip > > > +(define_peephole2 > > > + [(set (match_operand:SI 0 "register_operand" "") > > > + (match_operand:SI 1 "register_operand" "")) > > > + (set (match_dup 0) > > > + (plus:SI (match_dup 0) (match_operand:SI 2 "register_operand" > > > ""))) > > > + (set (match_dup 1) > > > + (match_dup 0))] > > > + "TARGET_THUMB1 > > > +&& REGNO_REG_CLASS (REGNO
Re: [PATCH] libstdc++: Test 17_intro/names.cc with -D_FORTIFY_SOURCE=2 [PR116210]
On Fri, Oct 04, 2024 at 10:03:36AM -0400, Siddhesh Poyarekar wrote: > > Add a new testcase that repeats 17_intro/names.cc but with > > _FORTIFY_SOURCE defined, to find problems in Glibc fortify wrappers like > > https://sourceware.org/bugzilla/show_bug.cgi?id=32052 (which is fixed > > now). > > > > libstdc++-v3/ChangeLog: > > > > PR libstdc++/116210 > > * testsuite/17_intro/names.cc (sz): Undef for versions of Glibc > > that use it in the fortify wrappers. > > * testsuite/17_intro/names_fortify.cc: New test. > > --- > > libstdc++-v3/testsuite/17_intro/names.cc | 7 +++ > > libstdc++-v3/testsuite/17_intro/names_fortify.cc | 6 ++ > > 2 files changed, 13 insertions(+) > > create mode 100644 libstdc++-v3/testsuite/17_intro/names_fortify.cc > > > > diff --git a/libstdc++-v3/testsuite/17_intro/names.cc > > b/libstdc++-v3/testsuite/17_intro/names.cc > > index 6b9a3639aad..bbf45b93dee 100644 > > --- a/libstdc++-v3/testsuite/17_intro/names.cc > > +++ b/libstdc++-v3/testsuite/17_intro/names.cc > > @@ -377,4 +377,11 @@ > > #undef y > > #endif > > +#if defined __GLIBC_PREREQ && defined _FORTIFY_SOURCE > > +# if __GLIBC_PREREQ(2,35) && ! __GLIBC_PREREQ(2,41) > > +// https://sourceware.org/bugzilla/show_bug.cgi?id=32052 > > +# undef sz > > +# endif > > +#endif > > We've backported the fix to stable branches, so the version check isn't > really that reliable. That doesn't matter that much. The worst that happens is that with those older fixed glibc versions the testing will not test that symbol. What is more important is that it is checked on the latest glibc, so when people test gcc with that version, they'll notice if it regresses. Jakub
Re: [PATCH] ssa-math-opts, i386: Improve spaceship expansion [PR116896]
On Fri, Oct 4, 2024 at 11:58 AM Jakub Jelinek wrote: > > Hi! > > The PR notes that we don't emit optimal code for C++ spaceship > operator if the result is returned as an integer rather than the > result just being compared against different values and different > code executed based on that. > So e.g. for > template > auto foo (T x, T y) { return x <=> y; } > for both floating point types, signed integer types and unsigned integer > types. auto in that case is std::strong_ordering or std::partial_ordering, > which are fancy C++ abstractions around struct with signed char member > which is -1, 0, 1 for the strong ordering and -1, 0, 1, 2 for the partial > ordering (but for -ffast-math 2 is never the case). > I'm afraid functions like that are fairly common and unless they are > inlined, we really need to map the comparison to those -1, 0, 1 or > -1, 0, 1, 2 values. > > Now, for floating point spaceship I've in the past already added an > optimization (with tree-ssa-math-opts.cc discovery and named optab, the > optab only defined on x86 though right now), which ensures there is just > a single comparison instruction and then just tests based on flags. > Now, if we have code like: > auto a = x <=> y; > if (a == std::partial_ordering::less) > bar (); > else if (a == std::partial_ordering::greater) > baz (); > else if (a == std::partial_ordering::equivalent) > qux (); > else if (a == std::partial_ordering::unordered) > corge (); > etc., that results in decent code generation, the spaceship named pattern > on x86 optimizes for the jumps, so emits comparisons on the flags, followed > by setting the result to -1, 0, 1, 2 and subsequent jump pass optimizes that > well. But if the result needs to be stored into an integer and just > returned that way or there are no immediate jumps based on it (or turned > into some non-standard integer values like -42, 0, 36, 75 etc.), then CE > doesn't do a good job for that, we end up with say > comiss %xmm1, %xmm0 > jp .L4 > seta%al > movl$0, %edx > leal-1(%rax,%rax), %eax > cmove %edx, %eax > ret > .L4: > movl$2, %eax > ret > The jp is good, that is the unlikely case and can't be easily handled in > straight line code due to the layout of the flags, but the rest uses cmov > which often isn't a win and a weird math. > With the patch below we can get instead > xorl%eax, %eax > comiss %xmm1, %xmm0 > jp .L2 > seta%al > sbbl$0, %eax > ret > .L2: > movl$2, %eax > ret > > The patch changes the discovery in the generic code, by detecting if > the future .SPACESHIP result is just used in a PHI with -1, 0, 1 or > -1, 0, 1, 2 values (the latter for HONOR_NANS) and passes that as a flag in > a new argument to .SPACESHIP ifn, so that the named pattern is told whether > it should optimize for branches or for loading the result into a -1, 0, 1 > (, 2) integer. Additionally, it doesn't detect just floating point <=> > anymore, but also integer and unsigned integer, but in those cases only > if an integer -1, 0, 1 is wanted (otherwise == and > or similar comparisons > result in good code). > The backend then can for those integer or unsigned integer <=>s return > effectively (x > y) - (x < y) in a way that is efficient on the target > (so for x86 with ensuring zero initialization first when needed before > setcc; one for floating point and unsigned, where there is just one setcc > and the second one optimized into sbb instruction, two for the signed int > case). So e.g. for signed int we now emit > xorl%edx, %edx > xorl%eax, %eax > cmpl%esi, %edi > setl%dl > setg%al > subl%edx, %eax > ret > and for unsigned > xorl%eax, %eax > cmpl%esi, %edi > seta%al > sbbb$0, %al > ret > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > Note, I wonder if other targets wouldn't benefit from defining the > named optab too... > > 2024-10-04 Jakub Jelinek > > PR middle-end/116896 > * optabs.def (spaceship_optab): Use spaceship$a4 rather than > spaceship$a3. > * internal-fn.cc (expand_SPACESHIP): Expect 3 call arguments > rather than 2, expand the last one, expect 4 operands of > spaceship_optab. > * tree-ssa-math-opts.cc: Include cfghooks.h. > (optimize_spaceship): Check if a single PHI is initialized to > -1, 0, 1, 2 or -1, 0, 1 values, in that case pass 1 as last (new) > argument to .SPACESHIP and optimize away the comparisons, > otherwise pass 0. Also check for integer comparisons rather than > floating point, in that case do it only if there is a single PHI > with -1, 0, 1 values and pass 1 to last argument of .SPACESHIP > if t
Re: [PATCH] libstdc++: Test 17_intro/names.cc with -D_FORTIFY_SOURCE=2 [PR116210]
On Fri, 4 Oct 2024 at 15:05, Siddhesh Poyarekar wrote: > > On 2024-10-04 07:52, Jonathan Wakely wrote: > > This doesn't really belong in our testsuite, because the sole purpose of > > the new test is to find bugs in the Glibc wrappers (like the one linked > > below). But maybe it's a kindness to do it in our testsuite, because we > > already have this test in place, and one Glibc bug was already found > > thanks to Sam running the existing test with _FORTIFY_SOURCE defined. > > > > Should we do this? > > > > -- >8 -- > > > > Add a new testcase that repeats 17_intro/names.cc but with > > _FORTIFY_SOURCE defined, to find problems in Glibc fortify wrappers like > > https://sourceware.org/bugzilla/show_bug.cgi?id=32052 (which is fixed > > now). > > > > libstdc++-v3/ChangeLog: > > > > PR libstdc++/116210 > > * testsuite/17_intro/names.cc (sz): Undef for versions of Glibc > > that use it in the fortify wrappers. > > * testsuite/17_intro/names_fortify.cc: New test. > > --- > > libstdc++-v3/testsuite/17_intro/names.cc | 7 +++ > > libstdc++-v3/testsuite/17_intro/names_fortify.cc | 6 ++ > > 2 files changed, 13 insertions(+) > > create mode 100644 libstdc++-v3/testsuite/17_intro/names_fortify.cc > > > > diff --git a/libstdc++-v3/testsuite/17_intro/names.cc > > b/libstdc++-v3/testsuite/17_intro/names.cc > > index 6b9a3639aad..bbf45b93dee 100644 > > --- a/libstdc++-v3/testsuite/17_intro/names.cc > > +++ b/libstdc++-v3/testsuite/17_intro/names.cc > > @@ -377,4 +377,11 @@ > > #undef y > > #endif > > > > +#if defined __GLIBC_PREREQ && defined _FORTIFY_SOURCE > > +# if __GLIBC_PREREQ(2,35) && ! __GLIBC_PREREQ(2,41) > > +// https://sourceware.org/bugzilla/show_bug.cgi?id=32052 > > +# undef sz > > +# endif > > +#endif > > We've backported the fix to stable branches, so the version check isn't > really that reliable. Yeah, but it doesn't matter if we #undef sz on some Glibc systems that don't actually have the bug.
Re: [patch,avr] Implement TARGET_FLOATN_MODE
On Fri, Oct 04, 2024 at 08:09:48AM -0600, Jeff Law wrote: > > > On 10/4/24 7:46 AM, Georg-Johann Lay wrote: > > This patch implements TARGET_FLOATN_MODE which maps > > _Float32[x] to SFmode and _Float64[x] to DFmode. > > > > There is currently no library support for extended float types, > > but these settings are more reasonable for avr (and they make > > more tests pass). > > > > Ok for trunk? > > > > Johann > > > > -- > > > > AVR: Implement TARGET_FLOATN_MODE. > > > > gcc/ > > * config/avr/avr.cc (avr_floatn_mode): New static function. > > (TARGET_FLOATN_MODE): New define. > OK This is certainly incorrect. As specified by e.g. ISO C23 H.2.3 Extended floating types, the requirement on the extended floating types is: "For each of its basic formats, IEC 60559 specifies an extended format whose maximum exponent and precision exceed those of the basic format it is associated with. Extended formats are intended for arithmetic with more precision and exponent range than is available in the basic formats used for the input data." So, while SFmode is a good mode to use for _Float32 and DFmode is a good mode to use for _Float64, SFmode isn't a good mode to use for _Float32x and neither is DFmode a good mode to use for _Float64x. I'd expect you want DFmode for _Float32x and opt_scalar_float_mode () for _Float64x. Jakub
Re: [PATCH] testsuite: Fix fallout of turning warnings into errors on 32-bit Arm
Hello Christophe, Christophe Lyon writes: > On Fri, 1 Mar 2024 at 15:29, Richard Earnshaw (lists) > wrote: >> >> On 01/03/2024 14:23, Andre Vieira (lists) wrote: >> > Hi Thiago, >> > >> > Thanks for this, LGTM but I can't approve this, CC'ing Richard. >> > >> > Do have a nitpick, in the gcc/testsuite/ChangeLog: remove 'gcc/testsuite' >> > from bullet >> > points 2-4. >> > >> >> Yes, this is OK with the change Andre mentioned (your push will fail if you >> don't fix >> that). >> >> PS, if you've set up GCC git customizations (see >> contrib/gcc-git-customization.sh), you >> can verify things like this with 'git gcc-verify HEAD^..HEAD' >> > > ISTM you have forgotten to commit this patch. > If you don't have commit rights, I can do it for you. That is true, sorry about that. I just pushed the patch as commit 115857bf1e32. It incorporates Andre's ChangeLog fix and git gcc-verify says the commit is OK. Thank you for reminding me. -- Thiago
Re: [PATCH] aarch64: Fix bug with max/min (PR116934)
Hi Saurabh, This looks good, one little nit: > gcc/ChangeLog: > > * config/aarch64/iterators.md: Move UNSPEC_COND_SMAX and > UNSPEC_COND_SMIN to correct iterators. This should also have the PR target/116934 before it - it's fine to change it when you commit. Speaking of which, can we try getting this committed before the weekend so the benchmark runs will work again? Cheers, Wilco
[patch,testsuite] Some float64 and float32x test require double64plus.
Some of the float64 and float32x test cases are using double built-ins and hence require double64plus resp. double_float32xplus, i.e. double is at least as good as float32x. This patch adds according dg-require-effective-target filters. (But only for test cases where I can verify that they are working with double64+ but are failing with double32.) Ok for trunk? Johann -- testsuite - Some float64 and float32x test require double64plus. Some of the float64 and float32x test cases are using double built-ins and hence require double64plus resp. that double is at least as good as float32x (double_float32xplus). gcc/testsuite/ * lib/target-supports.exp (check_effective_target_double_float32xplus): New proc. * gcc.dg/torture/float32x-builtin.c: Add dg-require-effective-target double_float32xplus. * gcc.dg/torture/float32x-tg-2.c: Same. * gcc.dg/torture/float32x-tg.c: Same. * gcc.dg/torture/float64-builtin.c: Add dg-require-effective-target double64plus. * gcc.dg/torture/float64-tg-2.c: Same. * gcc.dg/torture/float64-tg.c: Same.testsuite - Some float64 and float32x test require double64plus. Some of the float64 and float32x test cases are using double built-ins and hence require double64plus resp. that double is at least as good as float32x (double_float32xplus). gcc/testsuite/ * lib/target-supports.exp (check_effective_target_double_float32xplus): New proc. * gcc.dg/torture/float32x-builtin.c: Add dg-require-effective-target double_float32xplus. * gcc.dg/torture/float32x-tg-2.c: Same. * gcc.dg/torture/float32x-tg.c: Same. * gcc.dg/torture/float64-builtin.c: Add dg-require-effective-target double64plus. * gcc.dg/torture/float64-tg-2.c: Same. * gcc.dg/torture/float64-tg.c: Same. diff --git a/gcc/testsuite/gcc.dg/torture/float32x-builtin.c b/gcc/testsuite/gcc.dg/torture/float32x-builtin.c index 71eb7e2cdc8..0404d392705 100644 --- a/gcc/testsuite/gcc.dg/torture/float32x-builtin.c +++ b/gcc/testsuite/gcc.dg/torture/float32x-builtin.c @@ -4,6 +4,7 @@ /* { dg-add-options float32x } */ /* { dg-add-options ieee } */ /* { dg-require-effective-target float32x_runtime } */ +/* { dg-require-effective-target double_float32xplus } */ #define WIDTH 32 #define EXT 1 diff --git a/gcc/testsuite/gcc.dg/torture/float32x-tg-2.c b/gcc/testsuite/gcc.dg/torture/float32x-tg-2.c index 6179aba7cdd..dd7e2064a1a 100644 --- a/gcc/testsuite/gcc.dg/torture/float32x-tg-2.c +++ b/gcc/testsuite/gcc.dg/torture/float32x-tg-2.c @@ -4,6 +4,7 @@ /* { dg-add-options float32x } */ /* { dg-add-options ieee } */ /* { dg-require-effective-target float32x_runtime } */ +/* { dg-require-effective-target double_float32xplus } */ #define WIDTH 32 #define EXT 1 diff --git a/gcc/testsuite/gcc.dg/torture/float32x-tg.c b/gcc/testsuite/gcc.dg/torture/float32x-tg.c index b65b03f558b..87d9bef2b03 100644 --- a/gcc/testsuite/gcc.dg/torture/float32x-tg.c +++ b/gcc/testsuite/gcc.dg/torture/float32x-tg.c @@ -4,6 +4,7 @@ /* { dg-add-options float32x } */ /* { dg-add-options ieee } */ /* { dg-require-effective-target float32x_runtime } */ +/* { dg-require-effective-target double_float32xplus } */ #define WIDTH 32 #define EXT 1 diff --git a/gcc/testsuite/gcc.dg/torture/float64-builtin.c b/gcc/testsuite/gcc.dg/torture/float64-builtin.c index 413768443ae..2462017e4d5 100644 --- a/gcc/testsuite/gcc.dg/torture/float64-builtin.c +++ b/gcc/testsuite/gcc.dg/torture/float64-builtin.c @@ -4,6 +4,7 @@ /* { dg-add-options float64 } */ /* { dg-add-options ieee } */ /* { dg-require-effective-target float64_runtime } */ +/* { dg-require-effective-target double64plus } */ #define WIDTH 64 #define EXT 0 diff --git a/gcc/testsuite/gcc.dg/torture/float64-tg-2.c b/gcc/testsuite/gcc.dg/torture/float64-tg-2.c index d0e4316611f..f034e76cfeb 100644 --- a/gcc/testsuite/gcc.dg/torture/float64-tg-2.c +++ b/gcc/testsuite/gcc.dg/torture/float64-tg-2.c @@ -4,6 +4,7 @@ /* { dg-add-options float64 } */ /* { dg-add-options ieee } */ /* { dg-require-effective-target float64_runtime } */ +/* { dg-require-effective-target double64plus } */ #define WIDTH 64 #define EXT 0 diff --git a/gcc/testsuite/gcc.dg/torture/float64-tg.c b/gcc/testsuite/gcc.dg/torture/float64-tg.c index a7188312d57..d17ee0ecb19 100644 --- a/gcc/testsuite/gcc.dg/torture/float64-tg.c +++ b/gcc/testsuite/gcc.dg/torture/float64-tg.c @@ -4,6 +4,7 @@ /* { dg-add-options float64 } */ /* { dg-add-options ieee } */ /* { dg-require-effective-target float64_runtime } */ +/* { dg-require-effective-target double64plus } */ #define WIDTH 64 #define EXT 0 diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index f92f7f1af9c..459af8e58c6 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3965,6 +3
[PATCH 0/8] aarch64: Add new flags for existing features
This patch series adds 7 new flags for features that were previously available in GCC only as part of an architecture version. It also fixes one other instance where an architecture version was used in a check instead of a feature flag. Bootstrapped and regression tested as a whole on aarch64. I additionally ran the cpunative tests after each patch in the series. Ok for master?
Re: [PATCH] libstdc++: Implement LWG 3664 changes to ranges::distance
On Fri, 4 Oct 2024 at 19:37, Patrick Palka wrote: > > Tested on x86_64-pc-linux-gnu, does this look OK for trunk/backports? OK for all branches (assuming we already have LWG 3392 on the branches). > > -- >8 -- > > libstdc++-v3/ChangeLog: > > * include/bits/ranges_base.h (__distance_fn::operator()): > Adjust iterator/sentinel overloads as per LWG 3664. > * testsuite/24_iterators/range_operations/distance.cc: > Test LWG 3664 example. > --- > libstdc++-v3/include/bits/ranges_base.h| 14 +++--- > .../24_iterators/range_operations/distance.cc | 11 +++ > 2 files changed, 18 insertions(+), 7 deletions(-) > > diff --git a/libstdc++-v3/include/bits/ranges_base.h > b/libstdc++-v3/include/bits/ranges_base.h > index 137c3c98e14..cb2eba1f841 100644 > --- a/libstdc++-v3/include/bits/ranges_base.h > +++ b/libstdc++-v3/include/bits/ranges_base.h > @@ -947,7 +947,9 @@ namespace ranges > >struct __distance_fn final >{ > -template _Sent> > +// _GLIBCXX_RESOLVE_LIB_DEFECTS > +// 3664. LWG 3392 broke std::ranges::distance(a, a+3) > +template _Sent> >requires (!sized_sentinel_for<_Sent, _It>) >constexpr iter_difference_t<_It> >operator()[[nodiscard]](_It __first, _Sent __last) const > @@ -961,13 +963,11 @@ namespace ranges > return __n; >} > > -template _Sent> > +template> _Sent> >[[nodiscard]] > - constexpr iter_difference_t<_It> > - operator()(const _It& __first, const _Sent& __last) const > - { > - return __last - __first; > - } > + constexpr iter_difference_t> > + operator()(_It&& __first, _Sent __last) const > + { return __last - static_cast&>(__first); } > > template >[[nodiscard]] > diff --git a/libstdc++-v3/testsuite/24_iterators/range_operations/distance.cc > b/libstdc++-v3/testsuite/24_iterators/range_operations/distance.cc > index 9a1d0c3efe8..336956936c2 100644 > --- a/libstdc++-v3/testsuite/24_iterators/range_operations/distance.cc > +++ b/libstdc++-v3/testsuite/24_iterators/range_operations/distance.cc > @@ -144,6 +144,16 @@ test05() >VERIFY( std::ranges::distance(c4) == 5 ); > } > > +void > +test06() > +{ > + // LWG 3664 - LWG 3392 broke std::ranges::distance(a, a+3) > + int a[] = {1, 2, 3}; > + VERIFY( std::ranges::distance(a, a+3) == 3 ); > + VERIFY( std::ranges::distance(a, a) == 0 ); > + VERIFY( std::ranges::distance(a+3, a) == -3 ); > +} > + > int > main() > { > @@ -152,4 +162,5 @@ main() >test03(); >test04(); >test05(); > + test06(); > } > -- > 2.47.0.rc1 >
Re: [PATCH 0/8] aarch64: Add new flags for existing features
On Fri, Oct 4, 2024 at 10:51 AM Andrew Carlotti wrote: > > This patch series adds 7 new flags for features that were previously available > in GCC only as part of an architecture version. It also fixes one other > instance where an architecture version was used in a check instead of a > feature > flag. > > Bootstrapped and regression tested as a whole on aarch64. I additionally ran > the cpunative tests after each patch in the series. Ok for master? I think this is good except there is no modification of the documentation. Yes the feature flags are documented; see https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#g_t-march-and--mcpu-Feature-Modifiers . Thanks, Andrew
[COMMITTED] MAINTAINERS: Add myself to write after approval
ChangeLog: * MAINTAINERS: Add myself to write after approval. --- Hello, I just noticed that I wasn't yet in the write after approval section, so I just committed this patch. MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index ded5b3d4f643..9257b33ff089 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -344,6 +344,7 @@ Richard Ballricbal02 Scott Bambrough - Wolfgang Bangerth - Gergö Barany- +Thiago Jung Bauermann - Charles Baylis cbaylis Tejas Belagod belagod Andrey Belevantsev abel base-commit: 385a232229a5b4ee3f4d2a2472bcda28cd8d17b2
[PATCH 6/8] aarch64: Add new +rcpc2 flag
gcc/ChangeLog: * config/aarch64/aarch64-arches.def (V8_4A): Add RCPC2. * config/aarch64/aarch64-option-extensions.def (RCPC2): New flag. (RCPC3): Add RCPC2 dependency. * config/aarch64/aarch64.h (TARGET_RCPC2): Use new flag. gcc/testsuite/ChangeLog: * gcc.target/aarch64/cpunative/native_cpu_21.c: Add rcpc2 to expected feature string instead of rcpc. * gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto. diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def index 84782d55089650b5854c60497bc68f9564d6f90b..f182d3dc6c77bf63ab272ab1b5824c1523390e09 100644 --- a/gcc/config/aarch64/aarch64-arches.def +++ b/gcc/config/aarch64/aarch64-arches.def @@ -34,7 +34,7 @@ AARCH64_ARCH("armv8-a", generic_armv8_a, V8A, 8, (SIMD)) AARCH64_ARCH("armv8.1-a", generic_armv8_a, V8_1A, 8, (V8A, LSE, CRC, RDMA)) AARCH64_ARCH("armv8.2-a", generic_armv8_a, V8_2A, 8, (V8_1A)) AARCH64_ARCH("armv8.3-a", generic_armv8_a, V8_3A, 8, (V8_2A, PAUTH, RCPC, FCMA, JSCVT)) -AARCH64_ARCH("armv8.4-a", generic_armv8_a, V8_4A, 8, (V8_3A, F16FML, DOTPROD, FLAGM)) +AARCH64_ARCH("armv8.4-a", generic_armv8_a, V8_4A, 8, (V8_3A, F16FML, DOTPROD, FLAGM, RCPC2)) AARCH64_ARCH("armv8.5-a", generic_armv8_a, V8_5A, 8, (V8_4A, SB, SSBS, PREDRES, FRINTTS, FLAGM2)) AARCH64_ARCH("armv8.6-a", generic_armv8_a, V8_6A, 8, (V8_5A, I8MM, BF16)) AARCH64_ARCH("armv8.7-a", generic_armv8_a, V8_7A, 8, (V8_6A)) diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def index b73324abbeb6145b5a2c26fdb22f41de9b6045d9..b929773eba176a391d6e9242067e4f63e4434637 100644 --- a/gcc/config/aarch64/aarch64-option-extensions.def +++ b/gcc/config/aarch64/aarch64-option-extensions.def @@ -159,7 +159,9 @@ AARCH64_OPT_FMV_EXTENSION("fcma", FCMA, (SIMD), (), (), "fcma") AARCH64_OPT_FMV_EXTENSION("rcpc", RCPC, (), (), (), "lrcpc") -AARCH64_OPT_FMV_EXTENSION("rcpc3", RCPC3, (RCPC), (), (), "lrcpc3") +AARCH64_OPT_FMV_EXTENSION("rcpc2", RCPC2, (RCPC), (), (), "ilrcpc") + +AARCH64_OPT_FMV_EXTENSION("rcpc3", RCPC3, (RCPC2), (), (), "lrcpc3") AARCH64_OPT_FMV_EXTENSION("frintts", FRINTTS, (FP), (), (), "frint") diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 41430466b50bf223bf008c753d24f57570c1f2e5..3ed1930d3e4ac9f250219a43aa91cb8ed123f53c 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -427,7 +427,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED /* The RCPC2 extensions from Armv8.4-a that allow immediate offsets to LDAPR and sign-extending versions.*/ -#define TARGET_RCPC2 ((AARCH64_HAVE_ISA (V8_4A) && TARGET_RCPC) || TARGET_RCPC3) +#define TARGET_RCPC2 AARCH64_HAVE_ISA (RCPC2) /* RCPC3 (Release Consistency) extensions, optional from Armv8.2-a. */ #define TARGET_RCPC3 AARCH64_HAVE_ISA (RCPC3) diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c index c1d5896e1eb0b3b48ac0c1eeb95a74c4b6ec9e85..904cdf452263961442f3ecc31cd1b6563130f9c7 100644 --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c @@ -7,7 +7,7 @@ int main() return 0; } -/* { dg-final { scan-assembler {\.arch armv8-a\+flagm2\+lse\+dotprod\+rdma\+crc\+fp16fml\+jscvt\+rcpc\+frintts\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+sb\+ssbs\n} } } */ +/* { dg-final { scan-assembler {\.arch armv8-a\+flagm2\+lse\+dotprod\+rdma\+crc\+fp16fml\+jscvt\+rcpc2\+frintts\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+sb\+ssbs\n} } } */ /* Check that an Armv8-A core doesn't fall apart on extensions without midr values. */ diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c index 4533a2bf5912dc609327b63164ba4577e98f9eec..feb959b11b0e383a5e1f3214d55f80f56d2605d4 100644 --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c @@ -7,7 +7,7 @@ int main() return 0; } -/* { dg-final { scan-assembler {\.arch armv8-a\+flagm2\+lse\+dotprod\+rdma\+crc\+fp16fml\+jscvt\+rcpc\+frintts\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+sb\+ssbs\+pauth\n} } } */ +/* { dg-final { scan-assembler {\.arch armv8-a\+flagm2\+lse\+dotprod\+rdma\+crc\+fp16fml\+jscvt\+rcpc2\+frintts\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+sb\+ssbs\+pauth\n} } } */ /* Check that an Armv8-A core doesn't fall apart on extensions without midr values and that it enables optional features. */
[PATCH 8/8] aarch64: Add new +xs flag
GCC does not emit tlbi instructions, so this only affects the flags passed through to the assembler. gcc/ChangeLog: * config/aarch64/aarch64-arches.def (V8_7A): Add XS. * config/aarch64/aarch64-option-extensions.def (XS): New flag. diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def index fa06377dda089c8a89628bc4cc66d54510346053..66fe5cef0896847715d3b0a404ebabedfc82f34d 100644 --- a/gcc/config/aarch64/aarch64-arches.def +++ b/gcc/config/aarch64/aarch64-arches.def @@ -37,7 +37,7 @@ AARCH64_ARCH("armv8.3-a", generic_armv8_a, V8_3A, 8, (V8_2A, PAUTH, R AARCH64_ARCH("armv8.4-a", generic_armv8_a, V8_4A, 8, (V8_3A, F16FML, DOTPROD, FLAGM, RCPC2)) AARCH64_ARCH("armv8.5-a", generic_armv8_a, V8_5A, 8, (V8_4A, SB, SSBS, PREDRES, FRINTTS, FLAGM2)) AARCH64_ARCH("armv8.6-a", generic_armv8_a, V8_6A, 8, (V8_5A, I8MM, BF16)) -AARCH64_ARCH("armv8.7-a", generic_armv8_a, V8_7A, 8, (V8_6A, WFXT)) +AARCH64_ARCH("armv8.7-a", generic_armv8_a, V8_7A, 8, (V8_6A, WFXT, XS)) AARCH64_ARCH("armv8.8-a", generic_armv8_a, V8_8A, 8, (V8_7A, MOPS)) AARCH64_ARCH("armv8.9-a", generic_armv8_a, V8_9A, 8, (V8_8A, CSSC)) AARCH64_ARCH("armv8-r", generic_armv8_a, V8R , 8, (V8_4A)) diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def index 9781d48f63778d186b66427bae7deb2c01e14107..93adb556276c2379f50805d40d891229c87e1783 100644 --- a/gcc/config/aarch64/aarch64-option-extensions.def +++ b/gcc/config/aarch64/aarch64-option-extensions.def @@ -222,6 +222,8 @@ AARCH64_OPT_EXTENSION("ls64", LS64, (), (), (), "") AARCH64_OPT_FMV_EXTENSION("wfxt", WFXT, (), (), (), "wfxt") +AARCH64_OPT_EXTENSION("xs", XS, (), (), (), "") + AARCH64_OPT_EXTENSION("sme-f64f64", SME_F64F64, (SME), (), (), "") AARCH64_FMV_FEATURE("sme-f64f64", SME_F64, (SME_F64F64))
[PATCH 1/8] aarch64: Use PAUTH instead of V8_3A in some places
gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_expand_epilogue): Use TARGET_PAUTH. * config/aarch64/aarch64.md: Update comment. diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index e7bb3278a27eca44c46afd26069d608218198a54..cf1107127fd5d9e12ad42441528666bf6b733f73 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -10042,12 +10042,12 @@ aarch64_expand_epilogue (rtx_call_insn *sibcall) 1) Sibcalls don't return in a normal way, so if we're about to call one we must authenticate. - 2) The RETAA instruction is not available before ARMv8.3-A, so if we are - generating code for !TARGET_ARMV8_3 we can't use it and must + 2) The RETAA instruction is not available without FEAT_PAuth, so if we + are generating code for !TARGET_PAUTH we can't use it and must explicitly authenticate. */ if (aarch64_return_address_signing_enabled () - && (sibcall || !TARGET_ARMV8_3)) + && (sibcall || !TARGET_PAUTH)) { switch (aarch64_ra_sign_key) { diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index c54b29cd64b9e0dc6c6d12735049386ccedc5408..0940a84f9295ee2bc07282b150095fdb5af11a4d 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -7672,10 +7672,10 @@ ) ;; Pointer authentication patterns are always provided. In architecture -;; revisions prior to ARMv8.3-A these HINT instructions operate as NOPs. +;; revisions prior to FEAT_PAuth these HINT instructions operate as NOPs. ;; This lets the user write portable software which authenticates pointers -;; when run on something which implements ARMv8.3-A, and which runs -;; correctly, but does not authenticate pointers, where ARMv8.3-A is not +;; when run on something which implements FEAT_PAuth, and which runs +;; correctly, but does not authenticate pointers, where FEAT_PAuth is not ;; implemented. ;; Signing/Authenticating R30 using SP as the salt.
[PATCH 5/8] aarch64: Add new +flagm2 flag
GCC does not currently emit the axflag or xaflag instructions, so this primarily affects the flags passed through to the assembler. gcc/ChangeLog: * config/aarch64/aarch64-arches.def (V8_5A): Add FLAGM2. * config/aarch64/aarch64-option-extensions.def (FLAGM2): New flag. gcc/testsuite/ChangeLog: * gcc.target/aarch64/cpunative/native_cpu_21.c: Add flagm2 to expected feature string instead of flagm. * gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto. diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def index 668e7833bd81a7d8795df022f205ca7ca0d0ddef..84782d55089650b5854c60497bc68f9564d6f90b 100644 --- a/gcc/config/aarch64/aarch64-arches.def +++ b/gcc/config/aarch64/aarch64-arches.def @@ -35,7 +35,7 @@ AARCH64_ARCH("armv8.1-a", generic_armv8_a, V8_1A, 8, (V8A, LSE, CRC, AARCH64_ARCH("armv8.2-a", generic_armv8_a, V8_2A, 8, (V8_1A)) AARCH64_ARCH("armv8.3-a", generic_armv8_a, V8_3A, 8, (V8_2A, PAUTH, RCPC, FCMA, JSCVT)) AARCH64_ARCH("armv8.4-a", generic_armv8_a, V8_4A, 8, (V8_3A, F16FML, DOTPROD, FLAGM)) -AARCH64_ARCH("armv8.5-a", generic_armv8_a, V8_5A, 8, (V8_4A, SB, SSBS, PREDRES, FRINTTS)) +AARCH64_ARCH("armv8.5-a", generic_armv8_a, V8_5A, 8, (V8_4A, SB, SSBS, PREDRES, FRINTTS, FLAGM2)) AARCH64_ARCH("armv8.6-a", generic_armv8_a, V8_6A, 8, (V8_5A, I8MM, BF16)) AARCH64_ARCH("armv8.7-a", generic_armv8_a, V8_7A, 8, (V8_6A)) AARCH64_ARCH("armv8.8-a", generic_armv8_a, V8_8A, 8, (V8_7A, MOPS)) diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def index 505f1fb721c64e4b55b52baf465024a57c68ab98..b73324abbeb6145b5a2c26fdb22f41de9b6045d9 100644 --- a/gcc/config/aarch64/aarch64-option-extensions.def +++ b/gcc/config/aarch64/aarch64-option-extensions.def @@ -103,6 +103,8 @@ AARCH64_OPT_FMV_EXTENSION("rng", RNG, (), (), (), "rng") AARCH64_OPT_FMV_EXTENSION("flagm", FLAGM, (), (), (), "flagm") +AARCH64_OPT_FMV_EXTENSION("flagm2", FLAGM2, (FLAGM), (), (), "flagm2") + AARCH64_OPT_FMV_EXTENSION("lse", LSE, (), (), (), "atomics") AARCH64_OPT_FMV_EXTENSION("fp", FP, (), (), (), "fp") diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c index aa70d1d22b8299befcd81a696f051eb72997d548..c1d5896e1eb0b3b48ac0c1eeb95a74c4b6ec9e85 100644 --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c @@ -7,7 +7,7 @@ int main() return 0; } -/* { dg-final { scan-assembler {\.arch armv8-a\+flagm\+lse\+dotprod\+rdma\+crc\+fp16fml\+jscvt\+rcpc\+frintts\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+sb\+ssbs\n} } } */ +/* { dg-final { scan-assembler {\.arch armv8-a\+flagm2\+lse\+dotprod\+rdma\+crc\+fp16fml\+jscvt\+rcpc\+frintts\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+sb\+ssbs\n} } } */ /* Check that an Armv8-A core doesn't fall apart on extensions without midr values. */ diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c index ccd5d0d9bb7d7bf722bcffcc14c46d88d3223cf3..4533a2bf5912dc609327b63164ba4577e98f9eec 100644 --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c @@ -7,7 +7,7 @@ int main() return 0; } -/* { dg-final { scan-assembler {\.arch armv8-a\+flagm\+lse\+dotprod\+rdma\+crc\+fp16fml\+jscvt\+rcpc\+frintts\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+sb\+ssbs\+pauth\n} } } */ +/* { dg-final { scan-assembler {\.arch armv8-a\+flagm2\+lse\+dotprod\+rdma\+crc\+fp16fml\+jscvt\+rcpc\+frintts\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+sb\+ssbs\+pauth\n} } } */ /* Check that an Armv8-A core doesn't fall apart on extensions without midr values and that it enables optional features. */
[PATCH 7/8] aarch64: Add new +wfxt flag
GCC does not currently emit the wfet or wfit instructions, so this primarily affects the flags passed through to the assembler. gcc/ChangeLog: * config/aarch64/aarch64-arches.def (V8_7A): Add WFXT. * config/aarch64/aarch64-option-extensions.def (WFXT): New flag. diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def index f182d3dc6c77bf63ab272ab1b5824c1523390e09..fa06377dda089c8a89628bc4cc66d54510346053 100644 --- a/gcc/config/aarch64/aarch64-arches.def +++ b/gcc/config/aarch64/aarch64-arches.def @@ -37,7 +37,7 @@ AARCH64_ARCH("armv8.3-a", generic_armv8_a, V8_3A, 8, (V8_2A, PAUTH, R AARCH64_ARCH("armv8.4-a", generic_armv8_a, V8_4A, 8, (V8_3A, F16FML, DOTPROD, FLAGM, RCPC2)) AARCH64_ARCH("armv8.5-a", generic_armv8_a, V8_5A, 8, (V8_4A, SB, SSBS, PREDRES, FRINTTS, FLAGM2)) AARCH64_ARCH("armv8.6-a", generic_armv8_a, V8_6A, 8, (V8_5A, I8MM, BF16)) -AARCH64_ARCH("armv8.7-a", generic_armv8_a, V8_7A, 8, (V8_6A)) +AARCH64_ARCH("armv8.7-a", generic_armv8_a, V8_7A, 8, (V8_6A, WFXT)) AARCH64_ARCH("armv8.8-a", generic_armv8_a, V8_8A, 8, (V8_7A, MOPS)) AARCH64_ARCH("armv8.9-a", generic_armv8_a, V8_9A, 8, (V8_8A, CSSC)) AARCH64_ARCH("armv8-r", generic_armv8_a, V8R , 8, (V8_4A)) diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def index b929773eba176a391d6e9242067e4f63e4434637..9781d48f63778d186b66427bae7deb2c01e14107 100644 --- a/gcc/config/aarch64/aarch64-option-extensions.def +++ b/gcc/config/aarch64/aarch64-option-extensions.def @@ -220,6 +220,8 @@ AARCH64_OPT_EXTENSION("pauth", PAUTH, (), (), (), "paca pacg") AARCH64_OPT_EXTENSION("ls64", LS64, (), (), (), "") +AARCH64_OPT_FMV_EXTENSION("wfxt", WFXT, (), (), (), "wfxt") + AARCH64_OPT_EXTENSION("sme-f64f64", SME_F64F64, (SME), (), (), "") AARCH64_FMV_FEATURE("sme-f64f64", SME_F64, (SME_F64F64))
Re: [PATCH 1/2] aarch64: Split FCMA feature bit from Armv8.3-A
On Wed, Oct 02, 2024 at 06:13:38PM +0100, Andre Vieira wrote: > > This patch splits out FCMA as a feature from Armv8.3-A and adds it as a > separate > feature bit which now controls 'TARGET_COMPLEX'. > > gcc/ChangeLog: > > * config/aarch64/aarch64-arches.def (FCMA): New feature bit, can not be > used as an extension in the command-line. > * config/aarch64/aarch64.h (TARGET_COMPLEX): Use FCMA feature bit > rather than ARMV8_3. > --- > gcc/config/aarch64/aarch64-arches.def| 2 +- > gcc/config/aarch64/aarch64-option-extensions.def | 1 + > gcc/config/aarch64/aarch64.h | 2 +- > 3 files changed, 3 insertions(+), 2 deletions(-) > > diff --git a/gcc/config/aarch64/aarch64-arches.def > b/gcc/config/aarch64/aarch64-arches.def > index 4634b272e28..fadf9c36b03 100644 > --- a/gcc/config/aarch64/aarch64-arches.def > +++ b/gcc/config/aarch64/aarch64-arches.def > @@ -33,7 +33,7 @@ > AARCH64_ARCH("armv8-a", generic_armv8_a, V8A, 8, (SIMD)) > AARCH64_ARCH("armv8.1-a", generic_armv8_a, V8_1A, 8, (V8A, LSE, > CRC, RDMA)) > AARCH64_ARCH("armv8.2-a", generic_armv8_a, V8_2A, 8, (V8_1A)) > -AARCH64_ARCH("armv8.3-a", generic_armv8_a, V8_3A, 8, (V8_2A, > PAUTH, RCPC)) > +AARCH64_ARCH("armv8.3-a", generic_armv8_a, V8_3A, 8, (V8_2A, > PAUTH, RCPC, FCMA)) > AARCH64_ARCH("armv8.4-a", generic_armv8_a, V8_4A, 8, (V8_3A, > F16FML, DOTPROD, FLAGM)) > AARCH64_ARCH("armv8.5-a", generic_armv8_a, V8_5A, 8, (V8_4A, SB, > SSBS, PREDRES)) > AARCH64_ARCH("armv8.6-a", generic_armv8_a, V8_6A, 8, (V8_5A, > I8MM, BF16)) > diff --git a/gcc/config/aarch64/aarch64-option-extensions.def > b/gcc/config/aarch64/aarch64-option-extensions.def > index 6998627f377..4732c20ec96 100644 > --- a/gcc/config/aarch64/aarch64-option-extensions.def > +++ b/gcc/config/aarch64/aarch64-option-extensions.def > @@ -193,6 +193,7 @@ AARCH64_OPT_EXTENSION("sve2-sm4", SVE2_SM4, (SVE2, SM4), > (), (), "svesm4") > AARCH64_FMV_FEATURE("sve2-sm4", SVE_SM4, (SVE2_SM4)) > > AARCH64_OPT_FMV_EXTENSION("sme", SME, (BF16, SVE2), (), (), "sme") > +AARCH64_OPT_EXTENSION("", FCMA, (), (), (), "fcma") > > AARCH64_OPT_EXTENSION("memtag", MEMTAG, (), (), (), "") > > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h > index a99e7bb6c47..c0ad305e324 100644 > --- a/gcc/config/aarch64/aarch64.h > +++ b/gcc/config/aarch64/aarch64.h > @@ -362,7 +362,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE > ATTRIBUTE_UNUSED > #define TARGET_JSCVT (TARGET_FLOAT && TARGET_ARMV8_3) > > /* Armv8.3-a Complex number extension to AdvSIMD extensions. */ > -#define TARGET_COMPLEX (TARGET_SIMD && TARGET_ARMV8_3) > +#define TARGET_COMPLEX (TARGET_SIMD && AARCH64_HAVE_ISA (FCMA)) > > /* Floating-point rounding instructions from Armv8.5-a. */ > #define TARGET_FRINT (AARCH64_HAVE_ISA (V8_5A) && TARGET_FLOAT) This patch doesn't work (as I know you're already aware). I've posted a more complete patch to split out FCMA, which can replace this one. https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664568.html
[PATCH] libstdc++: Implement LWG 3664 changes to ranges::distance
Tested on x86_64-pc-linux-gnu, does this look OK for trunk/backports? -- >8 -- libstdc++-v3/ChangeLog: * include/bits/ranges_base.h (__distance_fn::operator()): Adjust iterator/sentinel overloads as per LWG 3664. * testsuite/24_iterators/range_operations/distance.cc: Test LWG 3664 example. --- libstdc++-v3/include/bits/ranges_base.h| 14 +++--- .../24_iterators/range_operations/distance.cc | 11 +++ 2 files changed, 18 insertions(+), 7 deletions(-) diff --git a/libstdc++-v3/include/bits/ranges_base.h b/libstdc++-v3/include/bits/ranges_base.h index 137c3c98e14..cb2eba1f841 100644 --- a/libstdc++-v3/include/bits/ranges_base.h +++ b/libstdc++-v3/include/bits/ranges_base.h @@ -947,7 +947,9 @@ namespace ranges struct __distance_fn final { -template _Sent> +// _GLIBCXX_RESOLVE_LIB_DEFECTS +// 3664. LWG 3392 broke std::ranges::distance(a, a+3) +template _Sent> requires (!sized_sentinel_for<_Sent, _It>) constexpr iter_difference_t<_It> operator()[[nodiscard]](_It __first, _Sent __last) const @@ -961,13 +963,11 @@ namespace ranges return __n; } -template _Sent> +template> _Sent> [[nodiscard]] - constexpr iter_difference_t<_It> - operator()(const _It& __first, const _Sent& __last) const - { - return __last - __first; - } + constexpr iter_difference_t> + operator()(_It&& __first, _Sent __last) const + { return __last - static_cast&>(__first); } template [[nodiscard]] diff --git a/libstdc++-v3/testsuite/24_iterators/range_operations/distance.cc b/libstdc++-v3/testsuite/24_iterators/range_operations/distance.cc index 9a1d0c3efe8..336956936c2 100644 --- a/libstdc++-v3/testsuite/24_iterators/range_operations/distance.cc +++ b/libstdc++-v3/testsuite/24_iterators/range_operations/distance.cc @@ -144,6 +144,16 @@ test05() VERIFY( std::ranges::distance(c4) == 5 ); } +void +test06() +{ + // LWG 3664 - LWG 3392 broke std::ranges::distance(a, a+3) + int a[] = {1, 2, 3}; + VERIFY( std::ranges::distance(a, a+3) == 3 ); + VERIFY( std::ranges::distance(a, a) == 0 ); + VERIFY( std::ranges::distance(a+3, a) == -3 ); +} + int main() { @@ -152,4 +162,5 @@ main() test03(); test04(); test05(); + test06(); } -- 2.47.0.rc1
Re: [PATCH] libstdc++: Test 17_intro/names.cc with -D_FORTIFY_SOURCE=2 [PR116210]
Jakub Jelinek writes: > On Fri, Oct 04, 2024 at 12:52:11PM +0100, Jonathan Wakely wrote: >> This doesn't really belong in our testsuite, because the sole purpose of >> the new test is to find bugs in the Glibc wrappers (like the one linked >> below). But maybe it's a kindness to do it in our testsuite, because we >> already have this test in place, and one Glibc bug was already found >> thanks to Sam running the existing test with _FORTIFY_SOURCE defined. >> >> Should we do this? > > I think so. While those bugs are glibc bugs, libstdc++ uses libc headers > and so if they have namespace cleanness issues, so does libstdc++. > >> Add a new testcase that repeats 17_intro/names.cc but with >> _FORTIFY_SOURCE defined, to find problems in Glibc fortify wrappers like >> https://sourceware.org/bugzilla/show_bug.cgi?id=32052 (which is fixed >> now). I think yes as well -- we've had a lot of discussions in glibc about getting to a place where we have tests to check the usability of headers (there's some for this specific namespace problem but there's some bigger stuff wrt parsing from Clang and so on) but we're not there yet. This feels like a cheap way of catching issues, and the fact that nobody noticed between 2.35 and 2.40 (i.e. ~3 years) means it's worthwhile IMO. >> >> libstdc++-v3/ChangeLog: >> >> PR libstdc++/116210 >> * testsuite/17_intro/names.cc (sz): Undef for versions of Glibc >> that use it in the fortify wrappers. >> * testsuite/17_intro/names_fortify.cc: New test. > > Jakub thanks, sam
Re: [PATCH 1/2] gcc: make Valgrind errors fatal during bootstrap
Jeff Law writes: > On 10/2/24 8:39 PM, Sam James wrote: >> Valgrind doesn't error out by default which means bootstrap issues like >> in PR116945 can easily be missed: pass --exit-errorcode=1 to handle this. >> While here, also set --trace-children=yes to cover child processes >> of tools invoked during the build. >> Note that this only handles tools invoke during the build, it >> doesn't >> cover everything that --enable-checking=valgrind does. >> gcc/ChangeLog: >> PR other/116945 >> PR other/116947 >> * configure: Regenerate. >> * configure.ac (valgrind_cmd): Pass additional options. > But is this going to cause all bootstraps with Ada to fail? That's > how I read 116945 which was closed as WONTFIX. Or am I > mis-interpreting that BZ and its interaction with this patch? No, you're right, I consider this on pause unless/until we figure out that bug -- I'm speaking with mjw about some ideas. > > jeff thanks, sam
Re: [PATCH 2/3] aarch64: libgcc: add prototypes in cpuinfo
> On 3 Oct 2024, at 21:44, Christophe Lyon wrote: > > External email: Use caution opening links or attachments > > > Add prototypes for __init_cpu_features_resolver and > __init_cpu_features to avoid warnings due to -Wmissing-prototypes. > >libgcc/ >* config/aarch64/cpuinfo.c (__init_cpu_features_resolver): Add >prototype. >(__init_cpu_features): Likewise. > --- > libgcc/config/aarch64/cpuinfo.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/libgcc/config/aarch64/cpuinfo.c b/libgcc/config/aarch64/cpuinfo.c > index 4b94fca8695..c62a7453e8e 100644 > --- a/libgcc/config/aarch64/cpuinfo.c > +++ b/libgcc/config/aarch64/cpuinfo.c > @@ -418,6 +418,7 @@ __init_cpu_features_constructor(unsigned long hwcap, > setCPUFeature(FEAT_INIT); > } > > +void __init_cpu_features_resolver(unsigned long, const __ifunc_arg_t *); > void > __init_cpu_features_resolver(unsigned long hwcap, const __ifunc_arg_t *arg) { > if (__aarch64_cpu_features.features) > @@ -425,6 +426,7 @@ __init_cpu_features_resolver(unsigned long hwcap, const > __ifunc_arg_t *arg) { > __init_cpu_features_constructor(hwcap, arg); > } > > +void __init_cpu_features(void); > void __attribute__ ((constructor)) > __init_cpu_features(void) { > unsigned long hwcap; I thought the intent of the missing-prototypes warning is to warn about missing prototypes in a header file primarily. Should these prototypes go into gcc/common/config/aarch64/cpuinfo.h instead? Thanks, Kyrill > -- > 2.34.1 >
Re: [PATCH 3/3] Record template specialization hash
On Thu, 3 Oct 2024, Jason Merrill wrote: > On 10/2/24 7:53 AM, Richard Biener wrote: > > For a specific testcase a lot of compile-time is spent in re-hashing > > hashtable elements upon expansion. The following records the hash > > in the hash element. This speeds up compilation by 20%. > > > > There's probably module-related uses that need to be adjusted. > > > > Bootstrap failed (guess I was expecting this), but still I think this > > is a good idea - maybe somebody can pick it up. > > Applying the attached, thanks! > > > Possibly instead of having a single global hash table having one per ID > > would be > > better. > > That sounds excessive to me. Is the actual hashtable lookup significant in > the profile? No, it's still template hashing at the top. > > The hashtable also keeps things GC-live ('args' for example). > > Those args should also be referenced by TI_ARGS from the respective template > specialization. I see. The changes improved things, the biggest fruit we can possibly still reap is coerce_template_parameter_pack causing 6GB transitional garbage we could release earlier (the packed_args allocation at the start of the function, the testcase ticks the last one, packed_args = make_tree_vec (nargs - arg_idx)). I've pasted the testcase below - it looks innocous and I suspect filling the templates with actual "meat" would shift the blame from argument/type vectors to elsewhere? clang++-17 just blew past my little machines 32GB of memory so at least we're not worst here. Richard. template struct Add {}; template struct Operand {}; template Operand operator+(const Operand&, const Operand&) { return {}; } auto stress_me(auto x) { return (x + x) + x + (x + (x + x) + x) + x + x; } auto apply_stress(auto op) { return stress_me(stress_me(stress_me(stress_me(stress_me(op); } template struct typelist {}; void invoke(auto); template void apply(typelist, auto&& fun) { fun(T0{}); if constexpr (sizeof...(Ts)) apply(typelist(), fun); } template Operand make_operand(T) { return {}; } auto pah() { apply(typelist(), [](auto op) { apply(typelist(), [&op](auto op2) { invoke(apply_stress(make_operand(op) + make_operand(op2))); }); }); }
[PATCH] aarch64: Fix bug with max/min (PR116934)
In ac4cdf5cb43c0b09e81760e2a1902ceebcf1a135, I introduced a bug where I put the new unspecs, UNSPEC_COND_SMAX and UNSPEC_COND_SMIN, into the wrong iterator. I should have put new unspecs in SVE_COND_FP_MAXMIN but I put it in SVE_COND_FP_BINARY_REG instead. That was incorrect because the SVE_COND_FP_MAXMIN iterator is being used for predicated floating-point maximum/minimum, not SVE_COND_FP_BINARY_REG. Also added a testcase to validate the new change. Regression tested on aarch64-unknown-linux-gnu and found no regressions. There are some test cases with "libitm" in their directory names which appear in compare_tests output as changed tests but it looks like they are in the output just because of changed build directories, like from build-patched/aarch64-unknown-linux-gnu/./libitm/* to build-pristine/aarch64-unknown-linux-gnu/./libitm/*. I didn't think it was a cause of concern and have pushed this for review. gcc/ChangeLog: * config/aarch64/iterators.md: Move UNSPEC_COND_SMAX and UNSPEC_COND_SMIN to correct iterators. gcc/testsuite/ChangeLog: PR target/116934 * gcc.target/aarch64/sve2/pr116934.c: New test. --- gcc/config/aarch64/iterators.md | 8 gcc/testsuite/gcc.target/aarch64/sve2/pr116934.c | 13 + 2 files changed, 17 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pr116934.c diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 0836dee61c9..fcad236eee9 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -3125,9 +3125,7 @@ (define_int_iterator SVE_COND_FP_BINARY_REG [UNSPEC_COND_FDIV - UNSPEC_COND_FMULX - UNSPEC_COND_SMAX - UNSPEC_COND_SMIN]) + UNSPEC_COND_FMULX]) (define_int_iterator SVE_COND_FCADD [UNSPEC_COND_FCADD90 UNSPEC_COND_FCADD270]) @@ -3135,7 +3133,9 @@ (define_int_iterator SVE_COND_FP_MAXMIN [UNSPEC_COND_FMAX UNSPEC_COND_FMAXNM UNSPEC_COND_FMIN - UNSPEC_COND_FMINNM]) + UNSPEC_COND_FMINNM + UNSPEC_COND_SMAX + UNSPEC_COND_SMIN]) (define_int_iterator SVE_COND_FP_TERNARY [UNSPEC_COND_FMLA UNSPEC_COND_FMLS diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pr116934.c b/gcc/testsuite/gcc.target/aarch64/sve2/pr116934.c new file mode 100644 index 000..94fb96ffa7d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pr116934.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-Ofast -mcpu=neoverse-v2" } */ + +int a; +float *b; + +void foo() { + for (; a; a--, b += 4) { +b[0] = b[1] = b[2] = b[2] > 0 ?: 0; +if (b[3] < 0) + b[3] = 0; + } +}
Re: [PATCH] i386: Fix up ix86_expand_int_compare with TImode comparisons of SUBREGs from V8{H,B}Fmode against zero [PR116921]
On Fri, Oct 4, 2024 at 12:12 PM Jakub Jelinek wrote: > > Hi! > > The following testcase ICEs, because the ix86_expand_int_compare > optimization to use {,v}ptest assumes there are instructions for all > 16-byte vector modes. That isn't the case, we only have one for > V16QI, V8HI, V4SI, V2DI, V1TI, V4SF and V2DF, not for > V8HF nor V8BF. > > The following patch fixes that by using the V8HI instruction instead > for those 2 modes. tmp can't be a SUBREG, because it is SUBREG_REG > of another SUBREG, so we don't need to worry about gen_lowpart > failing. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2024-10-04 Jakub Jelinek > > PR target/116921 > * config/i386/i386-expand.cc (ix86_expand_int_compare): Add a SUBREG > to V8HImode from V8HFmode or V8BFmode before generating a ptest. > > * gcc.target/i386/pr116921.c: New test. OK. Thanks, Uros. > > --- gcc/config/i386/i386-expand.cc.jj 2024-10-03 17:27:28.328227793 +0200 > +++ gcc/config/i386/i386-expand.cc 2024-10-03 18:11:18.514076904 +0200 > @@ -3095,6 +3095,9 @@ ix86_expand_int_compare (enum rtx_code c >&& GET_MODE_SIZE (GET_MODE (SUBREG_REG (op0))) == 16) > { >tmp = SUBREG_REG (op0); > + if (GET_MODE_INNER (GET_MODE (tmp)) == HFmode > + || GET_MODE_INNER (GET_MODE (tmp)) == BFmode) > + tmp = gen_lowpart (V8HImode, tmp); >tmp = gen_rtx_UNSPEC (CCZmode, gen_rtvec (2, tmp, tmp), UNSPEC_PTEST); > } >else > --- gcc/testsuite/gcc.target/i386/pr116921.c.jj 2024-10-03 18:16:36.368711747 > +0200 > +++ gcc/testsuite/gcc.target/i386/pr116921.c2024-10-03 18:17:25.702034243 > +0200 > @@ -0,0 +1,12 @@ > +/* PR target/116921 */ > +/* { dg-do compile { target int128 } } */ > +/* { dg-options "-O2 -msse4" } */ > + > +long x; > +_Float16 __attribute__((__vector_size__ (16))) f; > + > +void > +foo (void) > +{ > + x -= !(__int128) (f / 2); > +} > > Jakub >
[PATCH] aarch64: Handle SVE modes in aarch64_evpc_reencode
For Advanced SIMD modes, aarch64_evpc_reencode tests whether a permute in a narrow element mode can be done more cheaply in a wider mode. For example, { 0, 1, 8, 9, 4, 5, 12, 13 } on V8HI is a natural TRN1 on V4SI ({ 0, 4, 2, 6 }). This patch extends the code to handle SVE data and predicate modes as well. This is a prerequisite to getting good results for PR116583. Tested on aarch64-linux-gnu (with and without SVE enabled by default). I'll push on Monday if there are no comments before then. Thanks, Richard gcc/ PR target/116583 * config/aarch64/aarch64.cc (aarch64_coalesce_units): New function, extending the Advanced SIMD handling from... (aarch64_evpc_reencode): ...here to SVE data and predicate modes. gcc/testsuite/ PR target/116583 * gcc.target/aarch64/sve/permute_1.c: New test. * gcc.target/aarch64/sve/permute_2.c: Likewise. * gcc.target/aarch64/sve/permute_3.c: Likewise. * gcc.target/aarch64/sve/permute_4.c: Likewise. --- gcc/config/aarch64/aarch64.cc | 55 +++- .../gcc.target/aarch64/sve/permute_1.c| 106 +++ .../gcc.target/aarch64/sve/permute_2.c| 277 ++ .../gcc.target/aarch64/sve/permute_3.c| 91 ++ .../gcc.target/aarch64/sve/permute_4.c| 113 +++ 5 files changed, 633 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/permute_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/permute_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/permute_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/permute_4.c diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index e7bb3278a27..102680a0efc 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -1933,6 +1933,46 @@ aarch64_sve_int_mode (machine_mode mode) return aarch64_sve_data_mode (int_mode, GET_MODE_NUNITS (mode)).require (); } +/* Look for a vector mode with the same classification as VEC_MODE, + but with each group of FACTOR elements coalesced into a single element. + In other words, look for a mode in which the elements are FACTOR times + larger and in which the number of elements is FACTOR times smaller. + + Return the mode found, if one exists. */ + +static opt_machine_mode +aarch64_coalesce_units (machine_mode vec_mode, unsigned int factor) +{ + auto elt_bits = vector_element_size (GET_MODE_BITSIZE (vec_mode), + GET_MODE_NUNITS (vec_mode)); + auto vec_flags = aarch64_classify_vector_mode (vec_mode); + if (vec_flags & VEC_SVE_PRED) +{ + if (known_eq (GET_MODE_SIZE (vec_mode), BYTES_PER_SVE_PRED)) + return aarch64_sve_pred_mode (elt_bits * factor); + return {}; +} + + scalar_mode new_elt_mode; + if (!int_mode_for_size (elt_bits * factor, false).exists (&new_elt_mode)) +return {}; + + if (vec_flags == VEC_ADVSIMD) +{ + auto mode = aarch64_simd_container_mode (new_elt_mode, + GET_MODE_BITSIZE (vec_mode)); + if (mode != word_mode) + return mode; +} + else if (vec_flags & VEC_SVE_DATA) +{ + poly_uint64 new_nunits; + if (multiple_p (GET_MODE_NUNITS (vec_mode), factor, &new_nunits)) + return aarch64_sve_data_mode (new_elt_mode, new_nunits); +} + return {}; +} + /* Implement TARGET_VECTORIZE_RELATED_MODE. */ static opt_machine_mode @@ -25731,26 +25771,23 @@ aarch64_evpc_reencode (struct expand_vec_perm_d *d) { expand_vec_perm_d newd; - if (d->vec_flags != VEC_ADVSIMD) + /* The subregs that we'd create are not supported for big-endian SVE; + see aarch64_modes_compatible_p for details. */ + if (BYTES_BIG_ENDIAN && (d->vec_flags & VEC_ANY_SVE)) return false; /* Get the new mode. Always twice the size of the inner and half the elements. */ - poly_uint64 vec_bits = GET_MODE_BITSIZE (d->vmode); - unsigned int new_elt_bits = GET_MODE_UNIT_BITSIZE (d->vmode) * 2; - auto new_elt_mode = int_mode_for_size (new_elt_bits, false).require (); - machine_mode new_mode = aarch64_simd_container_mode (new_elt_mode, vec_bits); - - if (new_mode == word_mode) + machine_mode new_mode; + if (!aarch64_coalesce_units (d->vmode, 2).exists (&new_mode)) return false; vec_perm_indices newpermindices; - if (!newpermindices.new_shrunk_vector (d->perm, 2)) return false; newd.vmode = new_mode; - newd.vec_flags = VEC_ADVSIMD; + newd.vec_flags = d->vec_flags; newd.op_mode = newd.vmode; newd.op_vec_flags = newd.vec_flags; newd.target = d->target ? gen_lowpart (new_mode, d->target) : NULL; diff --git a/gcc/testsuite/gcc.target/aarch64/sve/permute_1.c b/gcc/testsuite/gcc.target/aarch64/sve/permute_1.c new file mode 100644 index 000..90aeef32188 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/permute_1.c @@ -0,0 +1,106 @@ +/* { dg-options "-O -msve-
[PATCH] aarch64: Fix general permutes of svbfloat16_ts
Testing gcc.target/aarch64/sve/permute_2.c without the associated GCC patch triggered an unrecognisable insn ICE for the svbfloat16_t tests. This was because the implementation of general two-vector permutes requires two TBLs and an ORR, with the ORR being represented as an unspec for floating-point modes. The associated pattern did not cover VNx8BF. Tested on aarch64-linux-gnu (with and without SVE enabled by default). I'll push on Monday if there are no comments before then. Thanks, Richard gcc/ * iterators.md (SVE_I): Move further up file. (SVE_F): New mode iterator. (SVE_ALL): Redefine in terms of SVE_I and SVE_F. * config/aarch64/aarch64-sve.md (*3): Extend to all SVE_F. gcc/testsuite/ * gcc.target/aarch64/sve/permute_5.c: New test. --- gcc/config/aarch64/aarch64-sve.md | 8 +++--- gcc/config/aarch64/iterators.md | 27 +-- .../gcc.target/aarch64/sve/permute_5.c| 10 +++ 3 files changed, 27 insertions(+), 18 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/permute_5.c diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index ec1d059a2b1..90db51e51b9 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -6455,10 +6455,10 @@ (define_expand "@aarch64_frecps" ;; by providing this, but we need to use UNSPECs since rtx logical ops ;; aren't defined for floating-point modes. (define_insn "*3" - [(set (match_operand:SVE_FULL_F 0 "register_operand" "=w") - (unspec:SVE_FULL_F - [(match_operand:SVE_FULL_F 1 "register_operand" "w") - (match_operand:SVE_FULL_F 2 "register_operand" "w")] + [(set (match_operand:SVE_F 0 "register_operand" "=w") + (unspec:SVE_F + [(match_operand:SVE_F 1 "register_operand" "w") + (match_operand:SVE_F 2 "register_operand" "w")] LOGICALF))] "TARGET_SVE" "\t%0.d, %1.d, %2.d" diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 0836dee61c9..0f19cae73c9 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -519,15 +519,20 @@ (define_mode_iterator SVE_PARTIAL_I [VNx8QI VNx4QI VNx2QI VNx4HI VNx2HI VNx2SI]) +;; All SVE integer vector modes. +(define_mode_iterator SVE_I [VNx16QI VNx8QI VNx4QI VNx2QI +VNx8HI VNx4HI VNx2HI +VNx4SI VNx2SI +VNx2DI]) + +;; All SVE floating-point vector modes. +(define_mode_iterator SVE_F [VNx8HF VNx4HF VNx2HF +VNx8BF VNx4BF VNx2BF +VNx4SF VNx2SF +VNx2DF]) + ;; All SVE vector modes. -(define_mode_iterator SVE_ALL [VNx16QI VNx8QI VNx4QI VNx2QI - VNx8HI VNx4HI VNx2HI - VNx8HF VNx4HF VNx2HF - VNx8BF VNx4BF VNx2BF - VNx4SI VNx2SI - VNx4SF VNx2SF - VNx2DI - VNx2DF]) +(define_mode_iterator SVE_ALL [SVE_I SVE_F]) ;; All SVE 2-vector modes. (define_mode_iterator SVE_FULLx2 [VNx32QI VNx16HI VNx8SI VNx4DI @@ -549,12 +554,6 @@ (define_mode_iterator SVE_STRUCT [SVE_FULLx2 SVE_FULLx3 SVE_FULLx4]) ;; All SVE vector and structure modes. (define_mode_iterator SVE_ALL_STRUCT [SVE_ALL SVE_STRUCT]) -;; All SVE integer vector modes. -(define_mode_iterator SVE_I [VNx16QI VNx8QI VNx4QI VNx2QI -VNx8HI VNx4HI VNx2HI -VNx4SI VNx2SI -VNx2DI]) - ;; All SVE integer vector modes and Advanced SIMD 64-bit vector ;; element modes (define_mode_iterator SVE_I_SIMD_DI [SVE_I V2DI]) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/permute_5.c b/gcc/testsuite/gcc.target/aarch64/sve/permute_5.c new file mode 100644 index 000..786b05ee3e7 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/permute_5.c @@ -0,0 +1,10 @@ +/* { dg-options "-O -msve-vector-bits=256" } */ + +typedef __SVBfloat16_t vbfloat16 __attribute__((arm_sve_vector_bits(256))); + +vbfloat16 +foo (vbfloat16 x, vbfloat16 y) +{ + return __builtin_shufflevector (x, y, 0, 2, 1, 3, 16, 19, 17, 18, + 8, 9, 10, 11, 23, 22, 21, 20); +} -- 2.25.1
[PATCH] x86: Disable stack protector for naked functions
Since naked functions should not enable stack protector, define TARGET_STACK_PROTECT_RUNTIME_ENABLED_P to disable stack protector for naked functions. gcc/ PR target/116962 * config/i386/i386.cc (ix86_stack_protect_runtime_enabled_p): New function. (TARGET_STACK_PROTECT_RUNTIME_ENABLED_P): New. gcc/testsuite/ PR target/116962 * gcc.target/i386/pr116962.c: New file. OK for master? Thanks. -- H.J. From 99ab364f6657c2d2e5e4a389b07b00c12d4bad0d Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Fri, 4 Oct 2024 16:21:15 +0800 Subject: [PATCH] x86: Disable stack protector for naked functions Since naked functions should not enable stack protector, define TARGET_STACK_PROTECT_RUNTIME_ENABLED_P to disable stack protector for naked functions. gcc/ PR target/116962 * config/i386/i386.cc (ix86_stack_protect_runtime_enabled_p): New function. (TARGET_STACK_PROTECT_RUNTIME_ENABLED_P): New. gcc/testsuite/ PR target/116962 * gcc.target/i386/pr116962.c: New file. Signed-off-by: H.J. Lu --- gcc/config/i386/i386.cc | 11 +++ gcc/testsuite/gcc.target/i386/pr116962.c | 10 ++ 2 files changed, 21 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr116962.c diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index ad2e7b447ff..90a564b2ffa 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -24435,6 +24435,13 @@ ix86_stack_protect_guard (void) return default_stack_protect_guard (); } +static bool +ix86_stack_protect_runtime_enabled_p (void) +{ + /* Naked functions should not enable stack protector. */ + return !ix86_function_naked (current_function_decl); +} + /* For 32-bit code we can save PIC register setup by using __stack_chk_fail_local hidden function instead of calling __stack_chk_fail directly. 64-bit code doesn't need to setup any PIC @@ -26821,6 +26828,10 @@ ix86_libgcc_floating_mode_supported_p #undef TARGET_STACK_PROTECT_GUARD #define TARGET_STACK_PROTECT_GUARD ix86_stack_protect_guard +#undef TARGET_STACK_PROTECT_RUNTIME_ENABLED_P +#define TARGET_STACK_PROTECT_RUNTIME_ENABLED_P \ + ix86_stack_protect_runtime_enabled_p + #if !TARGET_MACHO #undef TARGET_STACK_PROTECT_FAIL #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail diff --git a/gcc/testsuite/gcc.target/i386/pr116962.c b/gcc/testsuite/gcc.target/i386/pr116962.c new file mode 100644 index 000..ced16eee746 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr116962.c @@ -0,0 +1,10 @@ +/* { dg-do compile { target fstack_protector } } */ +/* { dg-options "-O2 -fstack-protector-all" } */ +/* { dg-final { scan-assembler-not "__stack_chk_fail" } } */ + +__attribute__ ((naked)) +void +foo (void) +{ + asm ("ret"); +} -- 2.46.2
[PATCH] Fixup dumping of re-trying without/with single-lane SLP
The following fixes the order of decrementing the SLP mode and the dumping. Build on x86_64-unknown-linux-gnu, pushed. * tree-vect-loop.cc (vect_analyze_loop_2): Derement 'slp' before dumping which stage we're starting. --- gcc/tree-vect-loop.cc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 3a9eca289d8..3d62fecfae1 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -3275,6 +3275,9 @@ again: } } + /* Roll back state appropriately. Degrade SLP this time. From multi- + to single-lane to disabled. */ + --slp; if (dump_enabled_p ()) { if (slp) @@ -3285,9 +3288,6 @@ again: "re-trying with SLP disabled\n"); } - /* Roll back state appropriately. Degrade SLP this time. From multi- - to single-lane to disabled. */ - --slp; /* Restore vectorization factor as it were without SLP. */ LOOP_VINFO_VECT_FACTOR (loop_vinfo) = saved_vectorization_factor; /* Free the SLP instances. */ -- 2.43.0
Re: [PATCH 1/2] c++: add -Wdeprecated-literal-operator [CWG2521]
On Fri, Oct 04, 2024 at 12:19:03PM +0200, Jakub Jelinek wrote: > Though, maybe the tests should have both the deprecated syntax and the > non-deprecated one... Here is a variant of the patch which does that. Tested on x86_64-linux and i686-linux, ok for trunk? 2024-10-04 Jakub Jelinek * g++.dg/cpp26/unevalstr1.C: Revert the 2024-10-03 changes, instead expect extra warnings. Add another set of tests without space between " and _. * g++.dg/cpp26/unevalstr2.C: Expect extra warnings for C++23. Add another set of tests without space between " and _. --- gcc/testsuite/g++.dg/cpp26/unevalstr1.C.jj 2024-10-04 12:28:08.820899177 +0200 +++ gcc/testsuite/g++.dg/cpp26/unevalstr1.C 2024-10-04 14:15:35.563531334 +0200 @@ -83,21 +83,57 @@ extern "\o{0103}" { int f14 (); } // { d [[nodiscard ("\x{20}")]] int h19 (); // { dg-error "numeric escape sequence in unevaluated string" } [[nodiscard ("\h")]] int h20 (); // { dg-error "unknown escape sequence" } -float operator ""_my0 (const char *); -float operator "" ""_my1 (const char *); -float operator L""_my2 (const char *); // { dg-error "invalid encoding prefix in literal operator" } -float operator u""_my3 (const char *); // { dg-error "invalid encoding prefix in literal operator" } -float operator U""_my4 (const char *); // { dg-error "invalid encoding prefix in literal operator" } -float operator u8""_my5 (const char *);// { dg-error "invalid encoding prefix in literal operator" } -float operator L"" ""_my6 (const char *); // { dg-error "invalid encoding prefix in literal operator" } -float operator u"" ""_my7 (const char *); // { dg-error "invalid encoding prefix in literal operator" } -float operator U"" ""_my8 (const char *); // { dg-error "invalid encoding prefix in literal operator" } -float operator u8"" ""_my9 (const char *); // { dg-error "invalid encoding prefix in literal operator" } -float operator "" L""_my10 (const char *); // { dg-error "invalid encoding prefix in literal operator" } -float operator "" u""_my11 (const char *); // { dg-error "invalid encoding prefix in literal operator" } -float operator "" U""_my12 (const char *); // { dg-error "invalid encoding prefix in literal operator" } -float operator "" u8""_my13 (const char *);// { dg-error "invalid encoding prefix in literal operator" } -float operator "\0"_my14 (const char *); // { dg-error "expected empty string after 'operator' keyword" } -float operator "\x00"_my15 (const char *); // { dg-error "expected empty string after 'operator' keyword" } -float operator "\h"_my16 (const char *); // { dg-error "expected empty string after 'operator' keyword" } +float operator "" _my0 (const char *); +float operator "" "" _my1 (const char *); +float operator L"" _my2 (const char *);// { dg-error "invalid encoding prefix in literal operator" } +float operator u"" _my3 (const char *);// { dg-error "invalid encoding prefix in literal operator" } +float operator U"" _my4 (const char *);// { dg-error "invalid encoding prefix in literal operator" } +float operator u8"" _my5 (const char *); // { dg-error "invalid encoding prefix in literal operator" } +float operator L"" "" _my6 (const char *); // { dg-error "invalid encoding prefix in literal operator" } +float operator u"" "" _my7 (const char *); // { dg-error "invalid encoding prefix in literal operator" } +float operator U"" "" _my8 (const char *); // { dg-error "invalid encoding prefix in literal operator" } +float operator u8"" "" _my9 (const char *);// { dg-error "invalid encoding prefix in literal operator" } +float operator "" L"" _my10 (const char *);// { dg-error "invalid encoding prefix in literal operator" } +float operator "" u"" _my11 (const char *);// { dg-error "invalid encoding prefix in literal operator" } +float operator "" U"" _my12 (const char *);// { dg-error "invalid encoding prefix in literal operator" } +float operator "" u8"" _my13 (const char *); // { dg-error "invalid encoding prefix in literal operator" } +float operator "\0" _my14 (const char *); // { dg-error "expected empty string after 'operator' keyword" } +float operator "\x00" _my15 (const char *);// { dg-error "expected empty string after 'operator' keyword" } +float operator "\h" _my16 (const char *); // { dg-error "expected empty string after 'operator' keyword" } + // { dg-error "unknown escape sequence" "" { target *-*-* } .-1 } +// { dg-warning "space between quotes and suffix is deprecated" "" { target *-*-* } .-18 } +// { dg-warning "space between quotes and suffix is deprecated" "" { target *-*-* } .-18 } +// { dg-warning "space between quotes and suffix is deprecated" "" { target *-*-* } .-18 } +// { dg-warning "space between quotes and suffix is deprecated" "" {
Re: [PATCH] libstdc++: Test 17_intro/names.cc with -D_FORTIFY_SOURCE=2 [PR116210]
On Fri, Oct 04, 2024 at 12:52:11PM +0100, Jonathan Wakely wrote: > This doesn't really belong in our testsuite, because the sole purpose of > the new test is to find bugs in the Glibc wrappers (like the one linked > below). But maybe it's a kindness to do it in our testsuite, because we > already have this test in place, and one Glibc bug was already found > thanks to Sam running the existing test with _FORTIFY_SOURCE defined. > > Should we do this? I think so. While those bugs are glibc bugs, libstdc++ uses libc headers and so if they have namespace cleanness issues, so does libstdc++. > Add a new testcase that repeats 17_intro/names.cc but with > _FORTIFY_SOURCE defined, to find problems in Glibc fortify wrappers like > https://sourceware.org/bugzilla/show_bug.cgi?id=32052 (which is fixed > now). > > libstdc++-v3/ChangeLog: > > PR libstdc++/116210 > * testsuite/17_intro/names.cc (sz): Undef for versions of Glibc > that use it in the fortify wrappers. > * testsuite/17_intro/names_fortify.cc: New test. Jakub
[PATCH 3/4] vect: Support more VLA SLP permutations
This is the main patch for PR116583. Previously, we only supported VLA SLP permutations for which the output and inputs have the same number of lanes, and for which that number of lanes divides the number of vector elements. The patch extends this to handle: (1) "packs" of a single 2N-vector input into an N-vector output (2) "unpacks" of N-vector inputs into an XN-vector output Hopefully the comments in the code explain the approach. The contents of the: for (unsigned i = 0; i < ncopies; ++i) loop do not change; the patch simply adds an outer loop around it. The patch removes the XFAIL in slp-13.c and also improves the SVE vect.exp results with vect-force-slp=1. I haven't added new tests specifically for this, since presumably the existing ones will cover it once the SLP switch is flipped. gcc/ PR tree-optimization/PR116583 * tree-vect-slp.cc (vectorizable_slp_permutation_1): Handle variable-length pack and unpack permutations. gcc/testsuite/ PR tree-optimization/PR116583 * gcc.dg/vect/slp-13.c: Remove xfail for vect_variable_length. * gcc.dg/vect/slp-13-big-array.c: Likewise. --- gcc/testsuite/gcc.dg/vect/slp-13-big-array.c | 2 +- gcc/testsuite/gcc.dg/vect/slp-13.c | 2 +- gcc/tree-vect-slp.cc | 107 ++- 3 files changed, 82 insertions(+), 29 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c b/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c index ca70856c1dd..e45f8aab133 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c +++ b/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c @@ -137,4 +137,4 @@ int main (void) /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { { vect_interleave && vect_extract_even_odd } && { ! vect_pack_trunc } } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { ! vect_pack_trunc } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { { vect_interleave && vect_extract_even_odd } && vect_pack_trunc } } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_pack_trunc xfail vect_variable_length } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_pack_trunc } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-13.c b/gcc/testsuite/gcc.dg/vect/slp-13.c index b7f947e6dbe..d6346aef978 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-13.c +++ b/gcc/testsuite/gcc.dg/vect/slp-13.c @@ -131,4 +131,4 @@ int main (void) /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { { vect_interleave && vect_extract_even_odd } && { ! vect_pack_trunc } } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { ! vect_pack_trunc } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { { vect_interleave && vect_extract_even_odd } && vect_pack_trunc } } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_pack_trunc xfail vect_variable_length } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_pack_trunc } } } */ diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 470128ea775..66f5906ebb9 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -10194,6 +10194,13 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi, unsigned i; poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); bool repeating_p = multiple_p (nunits, SLP_TREE_LANES (node)); + /* True if we're permuting a single input of 2N vectors down + to N vectors. This case doesn't generalize beyond 2 since + VEC_PERM_EXPR only takes 2 inputs. */ + bool pack_p = false; + /* If we're permuting inputs of N vectors each into X*N outputs, + this is the value of X, otherwise it is 1. */ + unsigned int unpack_factor = 1; tree op_vectype = NULL_TREE; FOR_EACH_VEC_ELT (children, i, child) if (SLP_TREE_VECTYPE (child)) @@ -10215,7 +10222,20 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi, "Unsupported vector types in lane permutation\n"); return -1; } - if (SLP_TREE_LANES (child) != SLP_TREE_LANES (node)) + auto op_nunits = TYPE_VECTOR_SUBPARTS (op_vectype); + unsigned int this_unpack_factor; + /* Check whether the input has twice as many lanes per vector. */ + if (children.length () == 1 + && known_eq (SLP_TREE_LANES (child) * nunits, + SLP_TREE_LANES (node) * op_nunits * 2)) + pack_p = true; + /* Check whether the output has N times as many lanes per vector. */ + else if (constant_multiple_p (SLP_TREE_LANES (node) * op_nunits, + SLP_TREE_LANES (child) * nunits, +
[PATCH 2/4] vect: Restructure repeating_p case for SLP permutations
The repeating_p case previously handled the specific situation in which the inputs have N lanes and the output has N lanes, where N divides the number of vector elements. In that case, every output uses the same permute vector. The code was therefore structured so that the outer loop only constructed one permute vector, with an inner loop generating as many VEC_PERM_EXPRs from it as required. However, the main patch for PR116583 adds support for cycling through N permute vectors, rather than just having one. The current structure doesn't really handle that case well. (We'd need to interleave the results after generating them, which sounds a bit fragile.) This patch instead makes the transform phase calculate each output vector's permutation explicitly, like for the !repeating_p path. As a bonus, it gets rid of one use of SLP_TREE_NUMBER_OF_VEC_STMTS. This arguably undermines one of the justifications for using repeating_p for constant-length vectors: that the repeating_p path involved less work than the !repeating_p path. That justification does still hold for the analysis phase, though, and that should be the more time-sensitive part. And the other justification -- to get more coverage of the code -- still applies. So I'd prefer that we continue to use repeating_p for constant-length vectors unless that causes a known missed optimisation. gcc/ PR tree-optimization/116583 * tree-vect-slp.cc (vectorizable_slp_permutation_1): Remove the noutputs_per_mask inner loop and instead generate a separate permute vector for each output. --- gcc/tree-vect-slp.cc | 75 1 file changed, 41 insertions(+), 34 deletions(-) diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 7aeda69f447..470128ea775 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -10243,26 +10243,33 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi, return 1; } - /* REPEATING_P is true if every output vector is guaranteed to use the - same permute vector. We can handle that case for both variable-length - and constant-length vectors, but we only handle other cases for - constant-length vectors. + /* Set REPEATING_P to true if every output uses the same permute vector + and if we can generate the vectors in a vector-length agnostic way. + + When REPEATING_P is true, NOUTPUTS holds the total number of outputs + that we actually need to generate. */ + uint64_t noutputs = 0; + loop_vec_info linfo = dyn_cast (vinfo); + if (!linfo + || !constant_multiple_p (LOOP_VINFO_VECT_FACTOR (linfo) + * SLP_TREE_LANES (node), nunits, &noutputs)) +repeating_p = false; + + /* We can handle the conditions described for REPEATING_P above for + both variable- and constant-length vectors. The fallback requires + us to generate every element of every permute vector explicitly, + which is only possible for constant-length permute vectors. Set: - NPATTERNS and NELTS_PER_PATTERN to the encoding of the permute - mask vector that we want to build. + mask vectors that we want to build. - NCOPIES to the number of copies of PERM that we need in order - to build the necessary permute mask vectors. - - - NOUTPUTS_PER_MASK to the number of output vectors we want to create - for each permute mask vector. This is only relevant when GSI is - nonnull. */ + to build the necessary permute mask vectors. */ uint64_t npatterns; unsigned nelts_per_pattern; uint64_t ncopies; - unsigned noutputs_per_mask; if (repeating_p) { /* We need a single permute mask vector that has the form: @@ -10274,7 +10281,6 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi, that we use for permutes requires 3n elements. */ npatterns = SLP_TREE_LANES (node); nelts_per_pattern = ncopies = 3; - noutputs_per_mask = SLP_TREE_NUMBER_OF_VEC_STMTS (node); } else { @@ -10284,10 +10290,8 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi, || !TYPE_VECTOR_SUBPARTS (op_vectype).is_constant ()) return -1; nelts_per_pattern = ncopies = 1; - if (loop_vec_info linfo = dyn_cast (vinfo)) - if (!LOOP_VINFO_VECT_FACTOR (linfo).is_constant (&ncopies)) - return -1; - noutputs_per_mask = 1; + if (linfo && !LOOP_VINFO_VECT_FACTOR (linfo).is_constant (&ncopies)) + return -1; } unsigned olanes = ncopies * SLP_TREE_LANES (node); gcc_assert (repeating_p || multiple_p (olanes, nunits)); @@ -10364,16 +10368,24 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi, mask.quick_grow (count); vec_perm_indices indices; unsigned nperms = 0; - for (unsigned i = 0; i < vperm.length (); ++i) -{ - mask_element = vperm[i].second
[PATCH 1/4] vect: Variable lane indices in vectorizable_slp_permutation_1
The main patch for PR116583 needs to create variable indices into an input vector. This pre-patch changes the types to allow that. There is no pretty-print format for poly_uint64 because of issues with passing C++ objects through "...". gcc/ PR tree-optimization/116583 * tree-vect-slp.cc (vectorizable_slp_permutation_1): Using poly_uint64 for scalar lane indices. --- gcc/tree-vect-slp.cc | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 482b9d50496..7aeda69f447 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -10296,8 +10296,8 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi, from the { SLP operand, scalar lane } permutation as recorded in the SLP node as intermediate step. This part should already work with SLP children with arbitrary number of lanes. */ - auto_vec, unsigned> > vperm; - auto_vec active_lane; + auto_vec, poly_uint64>> vperm; + auto_vec active_lane; vperm.create (olanes); active_lane.safe_grow_cleared (children.length (), true); for (unsigned i = 0; i < ncopies; ++i) @@ -10312,8 +10312,9 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi, { /* We checked above that the vectors are constant-length. */ unsigned vnunits = TYPE_VECTOR_SUBPARTS (vtype).to_constant (); - unsigned vi = (active_lane[p.first] + p.second) / vnunits; - unsigned vl = (active_lane[p.first] + p.second) % vnunits; + unsigned lane = active_lane[p.first].to_constant (); + unsigned vi = (lane + p.second) / vnunits; + unsigned vl = (lane + p.second) % vnunits; vperm.quick_push ({{p.first, vi}, vl}); } } @@ -10339,9 +10340,10 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi, ? multiple_p (i, npatterns) : multiple_p (i, TYPE_VECTOR_SUBPARTS (vectype dump_printf (MSG_NOTE, ","); - dump_printf (MSG_NOTE, " vops%u[%u][%u]", - vperm[i].first.first, vperm[i].first.second, - vperm[i].second); + dump_printf (MSG_NOTE, " vops%u[%u][", + vperm[i].first.first, vperm[i].first.second); + dump_dec (MSG_NOTE, vperm[i].second); + dump_printf (MSG_NOTE, "]"); } dump_printf (MSG_NOTE, "\n"); } -- 2.25.1
[PATCH 0/4] Support more VLA SLP permutations
This series should fix the target-independent parts of PR116583. (We also need some target-specific patches, to be posted separately.) The explanations are in the individual commit messages, but I've attached a -b diff below in case my attempt to split the patch up has just obfuscated things instead. Tested on aarch64-linux-gnu (with and without SVE enabled by default) and x86_64-linux-gnu. Also tested by running the vect testsuite with vect-force-slp=1. Richard Sandiford (4): vect: Variable lane indices in vectorizable_slp_permutation_1 vect: Restructure repeating_p case for SLP permutations vect: Support more VLA SLP permutations vect: Add more dump messages for VLA SLP permutation gcc/testsuite/gcc.dg/vect/slp-13-big-array.c | 2 +- gcc/testsuite/gcc.dg/vect/slp-13.c | 2 +- gcc/tree-vect-slp.cc | 190 +-- 3 files changed, 134 insertions(+), 60 deletions(-) -- 2.25.1 diff --git a/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c b/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c index ca70856c1dd..e45f8aab133 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c +++ b/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c @@ -137,4 +137,4 @@ int main (void) /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { { vect_interleave && vect_extract_even_odd } && { ! vect_pack_trunc } } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { ! vect_pack_trunc } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { { vect_interleave && vect_extract_even_odd } && vect_pack_trunc } } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_pack_trunc xfail vect_variable_length } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_pack_trunc } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-13.c b/gcc/testsuite/gcc.dg/vect/slp-13.c index b7f947e6dbe..d6346aef978 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-13.c +++ b/gcc/testsuite/gcc.dg/vect/slp-13.c @@ -131,4 +131,4 @@ int main (void) /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { { vect_interleave && vect_extract_even_odd } && { ! vect_pack_trunc } } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { ! vect_pack_trunc } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { { vect_interleave && vect_extract_even_odd } && vect_pack_trunc } } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_pack_trunc xfail vect_variable_length } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_pack_trunc } } } */ diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 482b9d50496..56fb55cb628 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -10194,6 +10194,13 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi, unsigned i; poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); bool repeating_p = multiple_p (nunits, SLP_TREE_LANES (node)); + /* True if we're permuting a single input of 2N vectors down + to N vectors. This case doesn't generalize beyond 2 since + VEC_PERM_EXPR only takes 2 inputs. */ + bool pack_p = false; + /* If we're permuting inputs of N vectors each into X*N outputs, + this is the value of X, otherwise it is 1. */ + unsigned int unpack_factor = 1; tree op_vectype = NULL_TREE; FOR_EACH_VEC_ELT (children, i, child) if (SLP_TREE_VECTYPE (child)) @@ -10215,7 +10222,20 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi, "Unsupported vector types in lane permutation\n"); return -1; } - if (SLP_TREE_LANES (child) != SLP_TREE_LANES (node)) + auto op_nunits = TYPE_VECTOR_SUBPARTS (op_vectype); + unsigned int this_unpack_factor; + /* Check whether the input has twice as many lanes per vector. */ + if (children.length () == 1 + && known_eq (SLP_TREE_LANES (child) * nunits, + SLP_TREE_LANES (node) * op_nunits * 2)) + pack_p = true; + /* Check whether the output has N times as many lanes per vector. */ + else if (constant_multiple_p (SLP_TREE_LANES (node) * op_nunits, + SLP_TREE_LANES (child) * nunits, + &this_unpack_factor) + && (i == 0 || unpack_factor == this_unpack_factor)) + unpack_factor = this_unpack_factor; + else repeating_p = false; } @@ -10243,29 +10263,47 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi, return 1; } - /* REPEATING_P is true if every output vector is guaranteed to use the - same permute vector. We can han
[PATCH 4/4] vect: Add more dump messages for VLA SLP permutation
Taking the !repeating_p route for VLA vectors causes analysis to fail, but it wasn't clear from the dump files when this had happened, and which node caused it. gcc/ PR tree-optimization/116583 * tree-vect-slp.cc (vectorizable_slp_permutation_1): Add more dump messages. --- gcc/tree-vect-slp.cc | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 66f5906ebb9..56fb55cb628 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -10319,10 +10319,22 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi, instead of relying on the pattern described above. */ if (!nunits.is_constant (&npatterns) || !TYPE_VECTOR_SUBPARTS (op_vectype).is_constant ()) - return -1; + { + if (dump_p) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, +"unsupported permutation %p on variable-length" +" vectors\n", (void *) node); + return -1; + } nelts_per_pattern = ncopies = 1; if (linfo && !LOOP_VINFO_VECT_FACTOR (linfo).is_constant (&ncopies)) - return -1; + { + if (dump_p) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, +"unsupported permutation %p for variable VF\n", +(void *) node); + return -1; + } pack_p = false; unpack_factor = 1; } -- 2.25.1
Re: [PATCH] libstdc++: Unroll loop in load_bytes function
On Fri, 4 Oct 2024 at 10:19, Jonathan Wakely wrote: > > On Fri, 4 Oct 2024 at 07:53, Richard Biener > wrote: > > > > On Wed, Oct 2, 2024 at 8:26 PM Jonathan Wakely wrote: > > > > > > On Wed, 2 Oct 2024 at 19:16, Jonathan Wakely wrote: > > > > > > > > On Wed, 2 Oct 2024 at 19:15, Dmitry Ilvokhin wrote: > > > > > > > > > > Instead of looping over every byte of the tail, unroll loop manually > > > > > using switch statement, then compilers (at least GCC and Clang) will > > > > > generate a jump table [1], which is faster on a microbenchmark [2]. > > > > > > > > > > [1]: https://godbolt.org/z/aE8Mq3j5G > > > > > [2]: https://quick-bench.com/q/ylYLW2R22AZKRvameYYtbYxag24 > > > > > > > > > > libstdc++-v3/ChangeLog: > > > > > > > > > > * libstdc++-v3/libsupc++/hash_bytes.cc (load_bytes): unroll > > > > > loop using switch statement. > > > > > > > > > > Signed-off-by: Dmitry Ilvokhin > > > > > --- > > > > > libstdc++-v3/libsupc++/hash_bytes.cc | 27 +++ > > > > > 1 file changed, 23 insertions(+), 4 deletions(-) > > > > > > > > > > diff --git a/libstdc++-v3/libsupc++/hash_bytes.cc > > > > > b/libstdc++-v3/libsupc++/hash_bytes.cc > > > > > index 3665375096a..294a7323dd0 100644 > > > > > --- a/libstdc++-v3/libsupc++/hash_bytes.cc > > > > > +++ b/libstdc++-v3/libsupc++/hash_bytes.cc > > > > > @@ -50,10 +50,29 @@ namespace > > > > >load_bytes(const char* p, int n) > > > > >{ > > > > > std::size_t result = 0; > > > > > ---n; > > > > > -do > > > > > - result = (result << 8) + static_cast(p[n]); > > > > > -while (--n >= 0); > > > > > > > > Don't we still need to loop, for the case where n >= 8? Otherwise we > > > > only hash the first 8 bytes. > > > > > > Ah, but it's only ever called with load_bytes(end, len & 0x7) > > > > The compiler should do such transforms - you probably want to tell > > it that n < 8 though, it likely doesn't (always) know. > > e.g. like this? > > if ((n & 7) != n) > __builtin_unreachable(); > > For the microbenchmark that seems to make things consistently worse: > https://quick-bench.com/q/2yCEqzFS8R8ueJ0-Gs-sZ6uWWEw Oh actually in the benchmark I used (!(1 <= n && n < 8)) because 1 <= n is always true too.
Re: [PATCH] libstdc++: Unroll loop in load_bytes function
On Fri, 4 Oct 2024 at 07:53, Richard Biener wrote: > > On Wed, Oct 2, 2024 at 8:26 PM Jonathan Wakely wrote: > > > > On Wed, 2 Oct 2024 at 19:16, Jonathan Wakely wrote: > > > > > > On Wed, 2 Oct 2024 at 19:15, Dmitry Ilvokhin wrote: > > > > > > > > Instead of looping over every byte of the tail, unroll loop manually > > > > using switch statement, then compilers (at least GCC and Clang) will > > > > generate a jump table [1], which is faster on a microbenchmark [2]. > > > > > > > > [1]: https://godbolt.org/z/aE8Mq3j5G > > > > [2]: https://quick-bench.com/q/ylYLW2R22AZKRvameYYtbYxag24 > > > > > > > > libstdc++-v3/ChangeLog: > > > > > > > > * libstdc++-v3/libsupc++/hash_bytes.cc (load_bytes): unroll > > > > loop using switch statement. > > > > > > > > Signed-off-by: Dmitry Ilvokhin > > > > --- > > > > libstdc++-v3/libsupc++/hash_bytes.cc | 27 +++ > > > > 1 file changed, 23 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/libstdc++-v3/libsupc++/hash_bytes.cc > > > > b/libstdc++-v3/libsupc++/hash_bytes.cc > > > > index 3665375096a..294a7323dd0 100644 > > > > --- a/libstdc++-v3/libsupc++/hash_bytes.cc > > > > +++ b/libstdc++-v3/libsupc++/hash_bytes.cc > > > > @@ -50,10 +50,29 @@ namespace > > > >load_bytes(const char* p, int n) > > > >{ > > > > std::size_t result = 0; > > > > ---n; > > > > -do > > > > - result = (result << 8) + static_cast(p[n]); > > > > -while (--n >= 0); > > > > > > Don't we still need to loop, for the case where n >= 8? Otherwise we > > > only hash the first 8 bytes. > > > > Ah, but it's only ever called with load_bytes(end, len & 0x7) > > The compiler should do such transforms - you probably want to tell > it that n < 8 though, it likely doesn't (always) know. e.g. like this? if ((n & 7) != n) __builtin_unreachable(); For the microbenchmark that seems to make things consistently worse: https://quick-bench.com/q/2yCEqzFS8R8ueJ0-Gs-sZ6uWWEw
Re: [PATCH] middle-end: reorder masking priority of math functions
On 10/4/24 09:32, Tamar Christina wrote: Hi Victor, -Original Message- From: Victor Do Nascimento Sent: Wednesday, October 2, 2024 5:26 PM To: gcc-patches@gcc.gnu.org Cc: Tamar Christina ; richard.guent...@gmail.com; Victor Do Nascimento Subject: [PATCH] middle-end: reorder masking priority of math functions Given the categorization of math built-in functions as `ECF_CONST', when if-converting their uses, their calls are not masked and are thus called with an all-true predicate. This, however, is not appropriate where built-ins have library equivalents, wherein they may exhibit highly architecture-specific behaviors. For example, vectorized implementations may delegate the computation of values outside a certain acceptable numerical range to special (non-vectorized) routines which considerably slow down computation. As numerical simulation programs often do bounds check on input values prior to math calls, conditionally assigning default output values for out-of-bounds input and skipping the math call altogether, these fallback implementations should seldom be called in the execution of vectorized code. If, however, we don't apply any masking to these math functions, we end up effectively executing both if and else branches for these values, leading to considerable performance degradation on scientific workloads. We therefore invert the order of handling of math function calls in `if_convertible_stmt_p' to prioritize the handling of their library-provided implementations over the equivalent internal function. I think this makes sense to me from a technical standpoint and from an SVE one. Though I think the original order may have been there because of the assumption that on some uarches unpredicated implementations are faster than predicated ones. So there may be some concerns about this order being slower for some. I'll leave it up to Richi since e.g. I don't know the perf characteristics of the x86 variants here, but if there is a concern you could use the conditional_operation_is_expensive target hook to decide on the preferred order. But other than that the change itself looks good to be but you still need approval. Cheers, Tamar Thank you very much for your input here, Tamar. Yes, I do agree that this solution may well not be the best path forward for all architectures and that is something that has indeed crossed my mind before. Nonetheless, I did think that the best way to get further feedback on the matter was to present this initial proposal to which others could respond as they saw fit regarding the performance characteristics in other architectures. Let's see what Richi has to say. If necessary we can, as you rightly suggested, resort to the use of the `conditional_operation_is_expensive' target hook. Many thanks once again, Victor Regression tested on aarch64-none-linux-gnu & x86_64-linux-gnu w/ no new regressions. gcc/ChangeLog: * tree-if-conv.cc (if_convertible_stmt_p): Check for explicit function declaration before IFN fallback. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-fncall-mask-math.c: New. --- .../gcc.dg/vect/vect-fncall-mask-math.c | 33 +++ gcc/tree-if-conv.cc | 18 +- 2 files changed, 42 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c diff --git a/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c new file mode 100644 index 000..15e22da2807 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c @@ -0,0 +1,33 @@ +/* Test the correct application of masking to autovectorized math function calls. + Test is currently set to xfail pending the release of the relevant lmvec + support. */ +/* { dg-do compile { target { aarch64*-*-* } } } */ +/* { dg-additional-options "-march=armv8.2-a+sve -fdump-tree-ifcvt-raw -Ofast" { target { aarch64*-*-* } } } */ + +#include + +const int N = 20; +const float lim = 101.0; +const float cst = -1.0; +float tot = 0.0; + +float b[20]; +float a[20] = { [0 ... 9] = 1.7014118e39, /* If branch. */ + [10 ... 19] = 100.0 };/* Else branch. */ + +int main (void) +{ + #pragma omp simd + for (int i = 0; i < N; i += 1) +{ + if (a[i] > lim) + b[i] = cst; + else + b[i] = expf (a[i]); + tot += b[i]; +} + return (0); +} + +/* { dg-final { scan-tree-dump-not { gimple_call } ifcvt { xfail { aarch64*-*-* } } } } */ +/* { dg-final { scan-tree-dump { gimple_call <.MASK_CALL, _2, expf, _1, _30>} ifcvt { xfail { aarch64*-*-* } } } } */ diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index 3b04d1e8d34..90c754a4814 100644 --- a/gcc/tree-if-conv.cc +++ b/gcc/tree-if-conv.cc @@ -1133,15 +1133,6 @@ if_convertible_stmt_p (gimple *stmt, vec refs) case GIMPLE_CALL: { - /* There are some IFN_s that are used to replace builtins but have
Re: [PATCH 3/3] aarch64: libgcc: Add -Werror support
> On 3 Oct 2024, at 21:44, Christophe Lyon wrote: > > External email: Use caution opening links or attachments > > > When --enable-werror is enabled when running the top-level configure, > it passes --enable-werror-always to subdirs. Some of them, like > libgcc, ignore it. > > This patch adds support for it, enabled only for aarch64, to avoid > breaking bootstrap for other targets. > The aarch64 part is ok but you’ll need a wider libgcc approval. It seems to me that if libgcc is intended to compile cleanly with -Werror then it should be a libgcc-wide change, but maybe doing it port-by-port is the only practical way of getting there? Thanks, Kyrill > The patch also adds -Wno-prio-ctor-dtor to avoid a warning when compiling > lse_init.c > >libgcc/ >* Makefile.in (WERROR): New. >* config/aarch64/t-aarch64: Handle WERROR. Always use >-Wno-prio-ctor-dtor. >* configure.ac: Add support for --enable-werror-always. >* configure: Regenerate. > --- > libgcc/Makefile.in | 1 + > libgcc/config/aarch64/t-aarch64 | 1 + > libgcc/configure| 31 +++ > libgcc/configure.ac | 5 + > 4 files changed, 38 insertions(+) > > diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in > index 0e46e9ef768..eca62546642 100644 > --- a/libgcc/Makefile.in > +++ b/libgcc/Makefile.in > @@ -84,6 +84,7 @@ AR_FLAGS = rc > > CC = @CC@ > CFLAGS = @CFLAGS@ > +WERROR = @WERROR@ > RANLIB = @RANLIB@ > LN_S = @LN_S@ > > diff --git a/libgcc/config/aarch64/t-aarch64 b/libgcc/config/aarch64/t-aarch64 > index b70e7b94edd..ae1588ce307 100644 > --- a/libgcc/config/aarch64/t-aarch64 > +++ b/libgcc/config/aarch64/t-aarch64 > @@ -30,3 +30,4 @@ LIB2ADDEH += \ >$(srcdir)/config/aarch64/__arm_za_disable.S > > SHLIB_MAPFILES += $(srcdir)/config/aarch64/libgcc-sme.ver > +LIBGCC2_CFLAGS += $(WERROR) -Wno-prio-ctor-dtor > diff --git a/libgcc/configure b/libgcc/configure > index cff1eff9625..ae56f7dbdc9 100755 > --- a/libgcc/configure > +++ b/libgcc/configure > @@ -592,6 +592,7 @@ enable_execute_stack > asm_hidden_op > extra_parts > cpu_type > +WERROR > get_gcc_base_ver > HAVE_STRUB_SUPPORT > thread_header > @@ -719,6 +720,7 @@ enable_tm_clone_registry > with_glibc_version > enable_tls > with_gcc_major_version_only > +enable_werror_always > ' > ac_precious_vars='build_alias > host_alias > @@ -1361,6 +1363,7 @@ Optional Features: > installations without PT_GNU_EH_FRAME support > --disable-tm-clone-registrydisable TM clone registry > --enable-tlsUse thread-local storage [default=yes] > + --enable-werror-always enable -Werror despite compiler version > > Optional Packages: > --with-PACKAGE[=ARG]use PACKAGE [ARG=yes] > @@ -5808,6 +5811,34 @@ fi > > > > +# Only enable with --enable-werror-always until existing warnings are > +# corrected. > +ac_ext=c > +ac_cpp='$CPP $CPPFLAGS' > +ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' > +ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS > conftest.$ac_ext $LIBS >&5' > +ac_compiler_gnu=$ac_cv_c_compiler_gnu > + > +WERROR= > +# Check whether --enable-werror-always was given. > +if test "${enable_werror_always+set}" = set; then : > + enableval=$enable_werror_always; > +else > + enable_werror_always=no > +fi > + > +if test $enable_werror_always = yes; then : > + WERROR="$WERROR${WERROR:+ }-Werror" > +fi > + > +ac_ext=c > +ac_cpp='$CPP $CPPFLAGS' > +ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' > +ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS > conftest.$ac_ext $LIBS >&5' > +ac_compiler_gnu=$ac_cv_c_compiler_gnu > + > + > + > # Substitute configuration variables > > > diff --git a/libgcc/configure.ac b/libgcc/configure.ac > index 4e8c036990f..6b3ea2aea5c 100644 > --- a/libgcc/configure.ac > +++ b/libgcc/configure.ac > @@ -13,6 +13,7 @@ sinclude(../config/unwind_ipinfo.m4) > sinclude(../config/gthr.m4) > sinclude(../config/sjlj.m4) > sinclude(../config/cet.m4) > +sinclude(../config/warnings.m4) > > AC_INIT([GNU C Runtime Library], 1.0,,[libgcc]) > AC_CONFIG_SRCDIR([static-object.mk]) > @@ -746,6 +747,10 @@ AC_SUBST(HAVE_STRUB_SUPPORT) > # Determine what GCC version number to use in filesystem paths. > GCC_BASE_VER > > +# Only enable with --enable-werror-always until existing warnings are > +# corrected. > +ACX_PROG_CC_WARNINGS_ARE_ERRORS([manual]) > + > # Substitute configuration variables > AC_SUBST(cpu_type) > AC_SUBST(extra_parts) > -- > 2.34.1 >
Re: [PATCH 0/2] aarch64: remove SVE2 requirement from SME and diagnose it as unsupported
Hi Andre, > On 2 Oct 2024, at 19:13, Andre Vieira wrote: > > External email: Use caution opening links or attachments > > > This patch series removes the requirement of SVE2 for SME, so when a user > passes +sme, SVE2 is not enabled as a result of that. > We do this to be compliant with the ISA and behave in a compatible manner to > other toolchains, to prevent unexpected behavior when switching between them. > > However, for the time being we diagnose the use of SME without SVE2 as > unsupported, we suspect that the backend correctly enables and disables the > right instructions given the options, but we believe that for certain codegen > there are assumptions that SVE & SVE2 is present when using SME. Before we > fully support this combination we should investigate these. Is that something you intend to do for GCC 15.1? I’m not a fan of the warning in patch [2/2]. If the compiler is at risk of crashing, generating wrong code, or emitting SVE code in non-streaming regions or other such violations then we should mark it as unsupported with an error. Usually diagnostics about “I could support it but I don’t” use the sorry () API for this reason. Thanks, Kyrill > > The patch series also refactors the FCMA/COMPNUM/TARGET_COMPLEX feature to > separate it from Armv8.3-A feature set. > > Andre Vieira (2) > aarch64: Split FCMA feature bit from Armv8.3-A > aarch64: remove SVE2 requirement from SME and diagnose it as unsupported > > Regression tested on aarch64-none-linux-gnu. > > OK for trunk? > > Andre Vieira (2): > aarch64: Split FCMA feature bit from Armv8.3-A > aarch64: remove SVE2 requirement from SME and diagnose it as unsupported > > gcc/config/aarch64/aarch64-arches.def | 2 +- > gcc/config/aarch64/aarch64-option-extensions.def | 4 +++- > gcc/config/aarch64/aarch64.cc | 4 > gcc/config/aarch64/aarch64.h | 2 +- > .../aarch64/sve/acle/general-c/binary_int_opt_single_n_2.c| 2 +- > .../aarch64/sve/acle/general-c/binary_opt_single_n_2.c| 2 +- > .../gcc.target/aarch64/sve/acle/general-c/binary_single_1.c | 2 +- > .../gcc.target/aarch64/sve/acle/general-c/binaryxn_2.c| 2 +- > gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/clamp_1.c | 2 +- > .../aarch64/sve/acle/general-c/compare_scalar_count_1.c | 2 +- > .../aarch64/sve/acle/general-c/shift_right_imm_narrowxn_1.c | 2 +- > .../gcc.target/aarch64/sve/acle/general-c/storexn_1.c | 2 +- > .../aarch64/sve/acle/general-c/ternary_qq_or_011_lane_1.c | 2 +- > .../gcc.target/aarch64/sve/acle/general-c/unary_convertxn_1.c | 2 +- > .../gcc.target/aarch64/sve/acle/general-c/unaryxn_1.c | 2 +- > 15 files changed, 20 insertions(+), 14 deletions(-) > > -- > 2.25.1 >
Re: [PATCH] c++/modules: Merge default arguments [PR99274]
Ping for https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660134.html. On Thu, Sep 12, 2024 at 01:35:38PM -0400, Patrick Palka wrote: > On Fri, 23 Aug 2024, Nathaniel Shead wrote: > > > On Thu, Aug 22, 2024 at 02:20:14PM -0400, Patrick Palka wrote: > > > On Mon, 12 Aug 2024, Nathaniel Shead wrote: > > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? > > > > > > > > I tried to implement a remapping of the slots for TARGET_EXPRs for the > > > > FIXME but I wasn't able to work out how to do so effectively. Given > > > > that I doubt this will be a common issue I felt probably easiest to > > > > leave it for now and focus on other issues in the meantime; thoughts? > > > > > > > > The other thing to note is that most of this function just has a single > > > > error message always indicated by a 'goto mismatch;' but I felt that it > > > > seemed reasonable to provide more specific error messages where we can. > > > > But given that in the long term we probably want to replace this > > > > function with an appropriately enhanced 'duplicate_decls' anyway maybe > > > > it's not worth worrying about; this patch is still useful in the > > > > meantime if only for the testcases, I hope. > > > > > > > > -- >8 -- > > > > > > > > When merging a newly imported declaration with an existing declaration > > > > we don't currently propagate new default arguments, which causes issues > > > > when modularising header units. This patch adds logic to propagate > > > > default arguments to existing declarations on import, and error if the > > > > defaults do not match. > > > > > > > > PR c++/99274 > > > > > > > > gcc/cp/ChangeLog: > > > > > > > > * module.cc (trees_in::is_matching_decl): Merge default > > > > arguments. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > * g++.dg/modules/default-arg-1_a.H: New test. > > > > * g++.dg/modules/default-arg-1_b.C: New test. > > > > * g++.dg/modules/default-arg-2_a.H: New test. > > > > * g++.dg/modules/default-arg-2_b.C: New test. > > > > * g++.dg/modules/default-arg-3.h: New test. > > > > * g++.dg/modules/default-arg-3_a.H: New test. > > > > * g++.dg/modules/default-arg-3_b.C: New test. > > > > > > > > Signed-off-by: Nathaniel Shead > > > > --- > > > > gcc/cp/module.cc | 62 ++- > > > > .../g++.dg/modules/default-arg-1_a.H | 17 + > > > > .../g++.dg/modules/default-arg-1_b.C | 26 > > > > .../g++.dg/modules/default-arg-2_a.H | 17 + > > > > .../g++.dg/modules/default-arg-2_b.C | 28 + > > > > gcc/testsuite/g++.dg/modules/default-arg-3.h | 13 > > > > .../g++.dg/modules/default-arg-3_a.H | 5 ++ > > > > .../g++.dg/modules/default-arg-3_b.C | 6 ++ > > > > 8 files changed, 171 insertions(+), 3 deletions(-) > > > > create mode 100644 gcc/testsuite/g++.dg/modules/default-arg-1_a.H > > > > create mode 100644 gcc/testsuite/g++.dg/modules/default-arg-1_b.C > > > > create mode 100644 gcc/testsuite/g++.dg/modules/default-arg-2_a.H > > > > create mode 100644 gcc/testsuite/g++.dg/modules/default-arg-2_b.C > > > > create mode 100644 gcc/testsuite/g++.dg/modules/default-arg-3.h > > > > create mode 100644 gcc/testsuite/g++.dg/modules/default-arg-3_a.H > > > > create mode 100644 gcc/testsuite/g++.dg/modules/default-arg-3_b.C > > > > > > > > diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc > > > > index f4d137b13a1..87f34bac578 100644 > > > > --- a/gcc/cp/module.cc > > > > +++ b/gcc/cp/module.cc > > > > @@ -11551,8 +11551,6 @@ trees_in::is_matching_decl (tree existing, tree > > > > decl, bool is_typedef) > > > > > > > > if (!same_type_p (TREE_VALUE (d_args), TREE_VALUE (e_args))) > > > > goto mismatch; > > > > - > > > > - // FIXME: Check default values > > > > } > > > > > > > >/* If EXISTING has an undeduced or uninstantiated exception > > > > @@ -11690,7 +11688,65 @@ trees_in::is_matching_decl (tree existing, > > > > tree decl, bool is_typedef) > > > >if (!DECL_EXTERNAL (d_inner)) > > > > DECL_EXTERNAL (e_inner) = false; > > > > > > > > - // FIXME: Check default tmpl and fn parms here > > > > + if (TREE_CODE (decl) == TEMPLATE_DECL) > > > > +{ > > > > + /* Merge default template arguments. */ > > > > + tree d_parms = DECL_INNERMOST_TEMPLATE_PARMS (decl); > > > > + tree e_parms = DECL_INNERMOST_TEMPLATE_PARMS (existing); > > > > + gcc_checking_assert (TREE_VEC_LENGTH (d_parms) > > > > + == TREE_VEC_LENGTH (e_parms)); > > > > + for (int i = 0; i < TREE_VEC_LENGTH (d_parms); ++i) > > > > + { > > > > + tree d_default = TREE_PURPOSE (TREE_VEC_ELT (d_parms, i)); > > > > + tree& e_default = TREE_PURPOSE (TREE_VEC_ELT (e_parms, i)); > > > > + if (e_default == NULL_TREE) > > > > + e_d
Re: [PATCH v5] gcc, libcpp: Add warning switch for "#pragma once in main file" [PR89808]
Ping for -Wno-pragma-once-outside-header. On Thursday, June 27th, 2024 at 11:00 AM, Ken Matsui wrote: > > > Ping. > > > On Sat, Jun 15, 2024 at 10:30 PM Ken Matsui kmat...@gcc.gnu.org wrote: > > > This patch adds a warning switch for "#pragma once in main file". The > > warning option name is Wpragma-once-outside-header, which is the same > > as Clang provides. > > > > PR preprocessor/89808 > > > > gcc/c-family/ChangeLog: > > > > * c.opt (Wpragma_once_outside_header): Define new option. > > * c.opt.urls: Regenerate. > > > > gcc/ChangeLog: > > > > * doc/invoke.texi (Warning Options): Document > > -Wno-pragma-once-outside-header. > > > > libcpp/ChangeLog: > > > > * include/cpplib.h (cpp_warning_reason): Define > > CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER. > > * directives.cc (do_pragma_once): Use > > CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER. > > > > gcc/testsuite/ChangeLog: > > > > * g++.dg/warn/Wno-pragma-once-outside-header.C: New test. > > * g++.dg/warn/Wpragma-once-outside-header.C: New test. > > > > Signed-off-by: Ken Matsui kmat...@gcc.gnu.org > > --- > > gcc/c-family/c.opt | 4 > > gcc/c-family/c.opt.urls | 3 +++ > > gcc/doc/invoke.texi | 10 -- > > .../g++.dg/warn/Wno-pragma-once-outside-header.C | 5 + > > .../g++.dg/warn/Wpragma-once-outside-header.C | 6 ++ > > libcpp/directives.cc | 3 ++- > > libcpp/include/cpplib.h | 3 ++- > > 7 files changed, 30 insertions(+), 4 deletions(-) > > create mode 100644 > > gcc/testsuite/g++.dg/warn/Wno-pragma-once-outside-header.C > > create mode 100644 gcc/testsuite/g++.dg/warn/Wpragma-once-outside-header.C > > > > diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt > > index 403abc1f26e..3439f36fe45 100644 > > --- a/gcc/c-family/c.opt > > +++ b/gcc/c-family/c.opt > > @@ -1188,6 +1188,10 @@ Wpragmas > > C ObjC C++ ObjC++ Var(warn_pragmas) Init(1) Warning > > Warn about misuses of pragmas. > > > > +Wpragma-once-outside-header > > +C ObjC C++ ObjC++ Var(warn_pragma_once_outside_header) > > CppReason(CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER) Init(1) Warning > > +Warn about #pragma once outside of a header. > > + > > Wprio-ctor-dtor > > C ObjC C++ ObjC++ Var(warn_prio_ctor_dtor) Init(1) Warning > > Warn if constructor or destructors with priorities from 0 to 100 are used. > > diff --git a/gcc/c-family/c.opt.urls b/gcc/c-family/c.opt.urls > > index dd455d7c0dc..778ca08be2e 100644 > > --- a/gcc/c-family/c.opt.urls > > +++ b/gcc/c-family/c.opt.urls > > @@ -672,6 +672,9 @@ > > UrlSuffix(gcc/Warning-Options.html#index-Wno-pointer-to-int-cast) > > Wpragmas > > UrlSuffix(gcc/Warning-Options.html#index-Wno-pragmas) > > > > +Wpragma-once-outside-header > > +UrlSuffix(gcc/Warning-Options.html#index-Wno-pragma-once-outside-header) > > + > > Wprio-ctor-dtor > > UrlSuffix(gcc/Warning-Options.html#index-Wno-prio-ctor-dtor) > > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > > index 9456ced468a..c7f17ca9eb7 100644 > > --- a/gcc/doc/invoke.texi > > +++ b/gcc/doc/invoke.texi > > @@ -391,8 +391,8 @@ Objective-C and Objective-C++ Dialects}. > > -Wpacked -Wno-packed-bitfield-compat -Wpacked-not-aligned -Wpadded > > -Wparentheses -Wno-pedantic-ms-format > > -Wpointer-arith -Wno-pointer-compare -Wno-pointer-to-int-cast > > --Wno-pragmas -Wno-prio-ctor-dtor -Wredundant-decls > > --Wrestrict -Wno-return-local-addr -Wreturn-type > > +-Wno-pragmas -Wno-pragma-once-outside-header -Wno-prio-ctor-dtor > > +-Wredundant-decls -Wrestrict -Wno-return-local-addr -Wreturn-type > > -Wno-scalar-storage-order -Wsequence-point > > -Wshadow -Wshadow=global -Wshadow=local -Wshadow=compatible-local > > -Wno-shadow-ivar > > @@ -7983,6 +7983,12 @@ Do not warn about misuses of pragmas, such as > > incorrect parameters, > > invalid syntax, or conflicts between pragmas. See also > > @option{-Wunknown-pragmas}. > > > > +@opindex Wno-pragma-once-outside-header > > +@opindex Wpragma-once-outside-header > > +@item -Wno-pragma-once-outside-header > > +Do not warn when @code{#pragma once} is used in a file that is not a header > > +file, such as a main file. > > + > > @opindex Wno-prio-ctor-dtor > > @opindex Wprio-ctor-dtor > > @item -Wno-prio-ctor-dtor > > diff --git a/gcc/testsuite/g++.dg/warn/Wno-pragma-once-outside-header.C > > b/gcc/testsuite/g++.dg/warn/Wno-pragma-once-outside-header.C > > new file mode 100644 > > index 000..b5be4d25a9d > > --- /dev/null > > +++ b/gcc/testsuite/g++.dg/warn/Wno-pragma-once-outside-header.C > > @@ -0,0 +1,5 @@ > > +// { dg-do assemble } > > +// { dg-options "-Wno-pragma-once-outside-header" } > > + > > +#pragma once > > +int main() {} > > diff --git a/gcc/testsuite/g++.dg/warn/Wpragma-once-outside-header.C > > b/gcc/testsuite/g++.dg/warn/Wpragma-once-outside-header.C > > new file mode 100644 > > index 000..29f09b69f71 > > --- /dev/null > > +++ b/gcc/testsuite/g++.dg/warn/Wpragma-once-outside-header.C > > @@ -0,0 +1,6 @@ > > +// { dg-do assemble } > > +// { dg-options "-Werror=pragma-once-outside-header" } > > +// { dg-message "some
Re: [PATCH 1/3] c++: Handle ABI for non-polymorphic dynamic classes
Ping for https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660956.html. On Wed, Aug 21, 2024 at 09:38:44AM +1000, Nathaniel Shead wrote: > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? > > -- >8 -- > > The Itanium ABI has specific rules for when virtual tables for dynamic > classes should be emitted. However we didn't consider structures with > virtual inheritance but no virtual members as dynamic classes for ABI > purposes; this patch fixes this. > > gcc/cp/ChangeLog: > > * decl2.cc (import_export_class): Use TYPE_CONTAINS_VPTR_P > instead of TYPE_POLYMORPHIC_P. > (import_export_decl): Likewise. > > gcc/testsuite/ChangeLog: > > * g++.dg/modules/virt-5_a.C: New test. > * g++.dg/modules/virt-5_b.C: New test. > > Signed-off-by: Nathaniel Shead > --- > gcc/cp/decl2.cc | 4 ++-- > gcc/testsuite/g++.dg/modules/virt-5_a.C | 16 > gcc/testsuite/g++.dg/modules/virt-5_b.C | 11 +++ > 3 files changed, 29 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/g++.dg/modules/virt-5_a.C > create mode 100644 gcc/testsuite/g++.dg/modules/virt-5_b.C > > diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc > index e9ae979896c..af544f40dac 100644 > --- a/gcc/cp/decl2.cc > +++ b/gcc/cp/decl2.cc > @@ -2431,7 +2431,7 @@ import_export_class (tree ctype) > translation unit, then export the class; otherwise, import > it. */ >import_export = -1; > - else if (TYPE_POLYMORPHIC_P (ctype)) > + else if (TYPE_CONTAINS_VPTR_P (ctype)) > { >tree cdecl = TYPE_NAME (ctype); >if (DECL_LANG_SPECIFIC (cdecl) && DECL_MODULE_ATTACH_P (cdecl)) > @@ -3527,7 +3527,7 @@ import_export_decl (tree decl) > class_type = type; > import_export_class (type); > if (CLASSTYPE_INTERFACE_KNOWN (type) > - && TYPE_POLYMORPHIC_P (type) > + && TYPE_CONTAINS_VPTR_P (type) > && CLASSTYPE_INTERFACE_ONLY (type) > /* If -fno-rtti was specified, then we cannot be sure >that RTTI information will be emitted with the > diff --git a/gcc/testsuite/g++.dg/modules/virt-5_a.C > b/gcc/testsuite/g++.dg/modules/virt-5_a.C > new file mode 100644 > index 000..f4c6abe85ef > --- /dev/null > +++ b/gcc/testsuite/g++.dg/modules/virt-5_a.C > @@ -0,0 +1,16 @@ > +// { dg-additional-options "-fmodules-ts" } > +// { dg-module-cmi M } > + > +export module M; > + > +struct C {}; > +struct B : virtual C {}; > + > +// Despite no non-inline key function, this is still a dynamic class > +// and so by the Itanium ABI 5.2.3 should be uniquely emitted in this TU > +export struct A : B { > + inline A (int) {} > +}; > + > +// { dg-final { scan-assembler {_ZTTW1M1A:} } } > +// { dg-final { scan-assembler {_ZTVW1M1A:} } } > diff --git a/gcc/testsuite/g++.dg/modules/virt-5_b.C > b/gcc/testsuite/g++.dg/modules/virt-5_b.C > new file mode 100644 > index 000..785dd92ac1e > --- /dev/null > +++ b/gcc/testsuite/g++.dg/modules/virt-5_b.C > @@ -0,0 +1,11 @@ > +// { dg-module-do link } > +// { dg-additional-options "-fmodules-ts" } > + > +import M; > + > +int main() { > + A a(0); > +} > + > +// { dg-final { scan-assembler-not {_ZTTW1M1A:} } } > +// { dg-final { scan-assembler-not {_ZTVW1M1A:} } } > -- > 2.43.2 >
Re: [PATCH 2/3] c++/modules: Prevent maybe_clone_decl being called multiple times [PR115007]
Ping for https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660957.html On Wed, Aug 21, 2024 at 09:40:25AM +1000, Nathaniel Shead wrote: > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? > > -- >8 -- > > The ICE in the linked PR is caused because maybe_clone_decl is not > prepared to be called on a declaration that has already had clones > created; what happens otherwise is that start_preparsed_function early > exits and never sets up cfun, causing a segfault later on. > > To fix this we ensure that post_load_processing only calls > maybe_clone_decl if TREE_ASM_WRITTEN has not been marked on the > declaration yet, and (if maybe_clone_decls succeeds) marks this flag on > the decl so that it doesn't get called again later when finalising > deferred vague linkage declarations in c_parse_final_cleanups. > > As a bonus this now allows us to only keep the DECL_SAVED_TREE around in > expand_or_defer_fn_1 for modules which have CMIs, which will have > benefits for LTO performance in non-interface TUs. > > For clarity we also update the streaming code to do post_load_decls for > maybe in-charge cdtors rather than any DECL_ABSTRACT_P declaration, as > this is more accurate to the decls affected by maybe_clone_body. > > PR c++/115007 > > gcc/cp/ChangeLog: > > * module.cc (module_state::read_cluster): Replace > DECL_ABSTRACT_P with DECL_MAYBE_IN_CHARGE_CDTOR_P. > (post_load_processing): Check and mark TREE_ASM_WRITTEN. > * semantics.cc (expand_or_defer_fn_1): Use the more specific > module_maybe_has_cmi_p instead of modules_p. > > gcc/testsuite/ChangeLog: > > * g++.dg/modules/virt-6_a.C: New test. > * g++.dg/modules/virt-6_b.C: New test. > > Signed-off-by: Nathaniel Shead > --- > gcc/cp/module.cc| 7 --- > gcc/cp/semantics.cc | 2 +- > gcc/testsuite/g++.dg/modules/virt-6_a.C | 13 + > gcc/testsuite/g++.dg/modules/virt-6_b.C | 6 ++ > 4 files changed, 24 insertions(+), 4 deletions(-) > create mode 100644 gcc/testsuite/g++.dg/modules/virt-6_a.C > create mode 100644 gcc/testsuite/g++.dg/modules/virt-6_b.C > > diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc > index 7c42aea05ee..5cd4f313933 100644 > --- a/gcc/cp/module.cc > +++ b/gcc/cp/module.cc > @@ -15525,7 +15525,7 @@ module_state::read_cluster (unsigned snum) > >if (abstract) > ; > - else if (DECL_ABSTRACT_P (decl)) > + else if (DECL_MAYBE_IN_CHARGE_CDTOR_P (decl)) > vec_safe_push (post_load_decls, decl); >else > { > @@ -17947,10 +17947,11 @@ post_load_processing () > >dump () && dump ("Post-load processing of %N", decl); > > - gcc_checking_assert (DECL_ABSTRACT_P (decl)); > + gcc_checking_assert (DECL_MAYBE_IN_CHARGE_CDTOR_P (decl)); >/* Cloning can cause loading -- specifically operator delete for >the deleting dtor. */ > - maybe_clone_body (decl); > + if (!TREE_ASM_WRITTEN (decl) && maybe_clone_body (decl)) > + TREE_ASM_WRITTEN (decl) = 1; > } > >cfun = old_cfun; > diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc > index 5ab2076b673..f7ae8e68dcf 100644 > --- a/gcc/cp/semantics.cc > +++ b/gcc/cp/semantics.cc > @@ -5122,7 +5122,7 @@ expand_or_defer_fn_1 (tree fn) >demand, so we also need to keep the body. Otherwise we don't >need it anymore. */ >if (!DECL_DECLARED_CONSTEXPR_P (fn) > - && !(modules_p () && vague_linkage_p (fn))) > + && !(module_maybe_has_cmi_p () && vague_linkage_p (fn))) > DECL_SAVED_TREE (fn) = NULL_TREE; >return false; > } > diff --git a/gcc/testsuite/g++.dg/modules/virt-6_a.C > b/gcc/testsuite/g++.dg/modules/virt-6_a.C > new file mode 100644 > index 000..68e466ace3f > --- /dev/null > +++ b/gcc/testsuite/g++.dg/modules/virt-6_a.C > @@ -0,0 +1,13 @@ > +// PR c++/115007 > +// { dg-additional-options "-fmodules-ts -Wno-global-module" } > +// { dg-module-cmi M:a } > + > +module; > +struct S { > + virtual ~S() = default; > + virtual void f() = 0; > +}; > +module M:a; > +extern S* p; > +template void format(T) { p->~S(); } > +template void format(int); > diff --git a/gcc/testsuite/g++.dg/modules/virt-6_b.C > b/gcc/testsuite/g++.dg/modules/virt-6_b.C > new file mode 100644 > index 000..c53f5fac742 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/modules/virt-6_b.C > @@ -0,0 +1,6 @@ > +// PR c++/115007 > +// { dg-additional-options "-fmodules-ts" } > +// { dg-module-cmi M } > + > +export module M; > +import :a; > -- > 2.43.2 >
Re: [PATCH 3/3] c++/modules: Support decloned cdtors
Ping for https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660958.html On Wed, Aug 21, 2024 at 09:41:31AM +1000, Nathaniel Shead wrote: > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? > > -- >8 -- > > When compiling with '-fdeclone-ctor-dtor' (enabled by default with -Os), > we run into issues where we don't correctly emit the underlying > functions. We also need to ensure that COMDAT constructors are marked > as such before 'maybe_clone_body' attempts to propagate COMDAT groups to > the new thunks. > > gcc/cp/ChangeLog: > > * module.cc (post_load_processing): Mark COMDAT as needed, emit > declarations if maybe_clone_body fails. > > gcc/testsuite/ChangeLog: > > * g++.dg/modules/clone-2_a.C: New test. > * g++.dg/modules/clone-2_b.C: New test. > * g++.dg/modules/clone-3_a.C: New test. > * g++.dg/modules/clone-3_b.C: New test. > > Signed-off-by: Nathaniel Shead > --- > gcc/cp/module.cc | 20 > gcc/testsuite/g++.dg/modules/clone-2_a.C | 7 +++ > gcc/testsuite/g++.dg/modules/clone-2_b.C | 5 + > gcc/testsuite/g++.dg/modules/clone-3_a.C | 9 + > gcc/testsuite/g++.dg/modules/clone-3_b.C | 8 > 5 files changed, 45 insertions(+), 4 deletions(-) > create mode 100644 gcc/testsuite/g++.dg/modules/clone-2_a.C > create mode 100644 gcc/testsuite/g++.dg/modules/clone-2_b.C > create mode 100644 gcc/testsuite/g++.dg/modules/clone-3_a.C > create mode 100644 gcc/testsuite/g++.dg/modules/clone-3_b.C > > diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc > index 5cd4f313933..9a9c0fdfe81 100644 > --- a/gcc/cp/module.cc > +++ b/gcc/cp/module.cc > @@ -17948,10 +17948,22 @@ post_load_processing () >dump () && dump ("Post-load processing of %N", decl); > >gcc_checking_assert (DECL_MAYBE_IN_CHARGE_CDTOR_P (decl)); > - /* Cloning can cause loading -- specifically operator delete for > - the deleting dtor. */ > - if (!TREE_ASM_WRITTEN (decl) && maybe_clone_body (decl)) > - TREE_ASM_WRITTEN (decl) = 1; > + > + if (DECL_COMDAT (decl)) > + comdat_linkage (decl); > + if (!TREE_ASM_WRITTEN (decl)) > + { > + /* Cloning can cause loading -- specifically operator delete for > + the deleting dtor. */ > + if (maybe_clone_body (decl)) > + TREE_ASM_WRITTEN (decl) = 1; > + else > + { > + /* We didn't clone the cdtor, make sure we emit it. */ > + note_vague_linkage_fn (decl); > + cgraph_node::finalize_function (decl, true); > + } > + } > } > >cfun = old_cfun; > diff --git a/gcc/testsuite/g++.dg/modules/clone-2_a.C > b/gcc/testsuite/g++.dg/modules/clone-2_a.C > new file mode 100644 > index 000..47e21581fdc > --- /dev/null > +++ b/gcc/testsuite/g++.dg/modules/clone-2_a.C > @@ -0,0 +1,7 @@ > +// { dg-additional-options "-fmodules-ts -fdeclone-ctor-dtor" } > +// { dg-module-cmi M } > + > +export module M; > +export struct S { > + inline S(int) {} > +}; > diff --git a/gcc/testsuite/g++.dg/modules/clone-2_b.C > b/gcc/testsuite/g++.dg/modules/clone-2_b.C > new file mode 100644 > index 000..80c1e149518 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/modules/clone-2_b.C > @@ -0,0 +1,5 @@ > +// { dg-additional-options "-fmodules-ts -fdeclone-ctor-dtor" } > + > +import M; > + > +S s(0); > diff --git a/gcc/testsuite/g++.dg/modules/clone-3_a.C > b/gcc/testsuite/g++.dg/modules/clone-3_a.C > new file mode 100644 > index 000..87de746f5c2 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/modules/clone-3_a.C > @@ -0,0 +1,9 @@ > +// { dg-additional-options "-fmodules-ts -fdeclone-ctor-dtor" } > +// { dg-module-cmi M } > + > +export module M; > + > +struct A {}; > +export struct B : virtual A { > + inline B (int) {} > +}; > diff --git a/gcc/testsuite/g++.dg/modules/clone-3_b.C > b/gcc/testsuite/g++.dg/modules/clone-3_b.C > new file mode 100644 > index 000..23c9ac4a804 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/modules/clone-3_b.C > @@ -0,0 +1,8 @@ > +// { dg-module-do link } > +// { dg-additional-options "-fmodules-ts -fdeclone-ctor-dtor" } > + > +import M; > + > +int main() { > + B b(0); > +} > -- > 2.43.2 >
Re: [PATCH 2/2] aarch64: remove SVE2 requirement from SME and diagnose it as unsupported
Hi Andre, > On 2 Oct 2024, at 19:13, Andre Vieira wrote: > > External email: Use caution opening links or attachments > > > As per the AArch64 ISA FEAT_SME does not require FEAT_SVE2, so we are removing > that false dependency in GCC. However, we chose for now to not support this > combination of features and will diagnose the combination of FEAT_SME without > FEAT_SVE2 as unsupported by GCC. We may choose to support this in the future. > > gcc/ChangeLog: > >* config/aarch64/aarch64-arches.def (SME): Remove SVE2 as prerequisite >and add in FCMA and F16FML. >* config/aarch64/aarch64.cc (aarch64_override_options): Diagnose use of >SME without SVE2. > > gcc/testsuite/ChangeLog: > >* gcc.target/aarch64/sve/acle/general-c/binary_int_opt_single_n_2.c: >Pass +sve2 to existing +sme pragma. >* gcc.target/aarch64/sve/acle/general-c/binary_opt_single_n_2.c: >Likewise. >* gcc.target/aarch64/sve/acle/general-c/binary_single_1.c: Likewise. >* gcc.target/aarch64/sve/acle/general-c/binaryxn_2.c: Likewise. >* gcc.target/aarch64/sve/acle/general-c/clamp_1.c: Likewise. >* gcc.target/aarch64/sve/acle/general-c/compare_scalar_count_1.c: >Likewise. >* gcc.target/aarch64/sve/acle/general-c/shift_right_imm_narrowxn_1.c: >Likewise. >* gcc.target/aarch64/sve/acle/general-c/storexn_1.c: Likewise. >* gcc.target/aarch64/sve/acle/general-c/ternary_qq_or_011_lane_1.c: >Likewise. >* gcc.target/aarch64/sve/acle/general-c/unary_convertxn_1.c: Likewise. >* gcc.target/aarch64/sve/acle/general-c/unaryxn_1.c: Likewise. > --- > gcc/config/aarch64/aarch64-option-extensions.def | 3 ++- > gcc/config/aarch64/aarch64.cc | 4 > .../aarch64/sve/acle/general-c/binary_int_opt_single_n_2.c| 2 +- > .../aarch64/sve/acle/general-c/binary_opt_single_n_2.c| 2 +- > .../gcc.target/aarch64/sve/acle/general-c/binary_single_1.c | 2 +- > .../gcc.target/aarch64/sve/acle/general-c/binaryxn_2.c| 2 +- > gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/clamp_1.c | 2 +- > .../aarch64/sve/acle/general-c/compare_scalar_count_1.c | 2 +- > .../aarch64/sve/acle/general-c/shift_right_imm_narrowxn_1.c | 2 +- > .../gcc.target/aarch64/sve/acle/general-c/storexn_1.c | 2 +- > .../aarch64/sve/acle/general-c/ternary_qq_or_011_lane_1.c | 2 +- > .../gcc.target/aarch64/sve/acle/general-c/unary_convertxn_1.c | 2 +- > .../gcc.target/aarch64/sve/acle/general-c/unaryxn_1.c | 2 +- > 13 files changed, 17 insertions(+), 12 deletions(-) > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 68913beaee2..bc2023da180 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -18998,6 +18998,10 @@ aarch64_override_options (void) while processing functions with potential target attributes. */ target_option_default_node = target_option_current_node = build_target_option_node (&global_options, &global_options_set); + + if (TARGET_SME && !TARGET_SVE2) +warning (0, "this gcc version does not guarantee full support for +sme" + " without +sve2"); } Beyond my comments on the cover letter, if you do intend to give some message here anyway, this can be more fancy :) You can use %qs to quote the +sme and +sve2 strings and I don’t think we usually refer to GCC itself from warnings. I think a passive voice would fit better. Regardless of what we do for the warning this restriction should be documented in doc/invoke.texi if we end up having it for the GCC 15.1 release. Thanks, Kyrill
[PATCH] ssa-math-opts, i386: Improve spaceship expansion [PR116896]
Hi! The PR notes that we don't emit optimal code for C++ spaceship operator if the result is returned as an integer rather than the result just being compared against different values and different code executed based on that. So e.g. for template auto foo (T x, T y) { return x <=> y; } for both floating point types, signed integer types and unsigned integer types. auto in that case is std::strong_ordering or std::partial_ordering, which are fancy C++ abstractions around struct with signed char member which is -1, 0, 1 for the strong ordering and -1, 0, 1, 2 for the partial ordering (but for -ffast-math 2 is never the case). I'm afraid functions like that are fairly common and unless they are inlined, we really need to map the comparison to those -1, 0, 1 or -1, 0, 1, 2 values. Now, for floating point spaceship I've in the past already added an optimization (with tree-ssa-math-opts.cc discovery and named optab, the optab only defined on x86 though right now), which ensures there is just a single comparison instruction and then just tests based on flags. Now, if we have code like: auto a = x <=> y; if (a == std::partial_ordering::less) bar (); else if (a == std::partial_ordering::greater) baz (); else if (a == std::partial_ordering::equivalent) qux (); else if (a == std::partial_ordering::unordered) corge (); etc., that results in decent code generation, the spaceship named pattern on x86 optimizes for the jumps, so emits comparisons on the flags, followed by setting the result to -1, 0, 1, 2 and subsequent jump pass optimizes that well. But if the result needs to be stored into an integer and just returned that way or there are no immediate jumps based on it (or turned into some non-standard integer values like -42, 0, 36, 75 etc.), then CE doesn't do a good job for that, we end up with say comiss %xmm1, %xmm0 jp .L4 seta%al movl$0, %edx leal-1(%rax,%rax), %eax cmove %edx, %eax ret .L4: movl$2, %eax ret The jp is good, that is the unlikely case and can't be easily handled in straight line code due to the layout of the flags, but the rest uses cmov which often isn't a win and a weird math. With the patch below we can get instead xorl%eax, %eax comiss %xmm1, %xmm0 jp .L2 seta%al sbbl$0, %eax ret .L2: movl$2, %eax ret The patch changes the discovery in the generic code, by detecting if the future .SPACESHIP result is just used in a PHI with -1, 0, 1 or -1, 0, 1, 2 values (the latter for HONOR_NANS) and passes that as a flag in a new argument to .SPACESHIP ifn, so that the named pattern is told whether it should optimize for branches or for loading the result into a -1, 0, 1 (, 2) integer. Additionally, it doesn't detect just floating point <=> anymore, but also integer and unsigned integer, but in those cases only if an integer -1, 0, 1 is wanted (otherwise == and > or similar comparisons result in good code). The backend then can for those integer or unsigned integer <=>s return effectively (x > y) - (x < y) in a way that is efficient on the target (so for x86 with ensuring zero initialization first when needed before setcc; one for floating point and unsigned, where there is just one setcc and the second one optimized into sbb instruction, two for the signed int case). So e.g. for signed int we now emit xorl%edx, %edx xorl%eax, %eax cmpl%esi, %edi setl%dl setg%al subl%edx, %eax ret and for unsigned xorl%eax, %eax cmpl%esi, %edi seta%al sbbb$0, %al ret Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Note, I wonder if other targets wouldn't benefit from defining the named optab too... 2024-10-04 Jakub Jelinek PR middle-end/116896 * optabs.def (spaceship_optab): Use spaceship$a4 rather than spaceship$a3. * internal-fn.cc (expand_SPACESHIP): Expect 3 call arguments rather than 2, expand the last one, expect 4 operands of spaceship_optab. * tree-ssa-math-opts.cc: Include cfghooks.h. (optimize_spaceship): Check if a single PHI is initialized to -1, 0, 1, 2 or -1, 0, 1 values, in that case pass 1 as last (new) argument to .SPACESHIP and optimize away the comparisons, otherwise pass 0. Also check for integer comparisons rather than floating point, in that case do it only if there is a single PHI with -1, 0, 1 values and pass 1 to last argument of .SPACESHIP if the <=> is signed, 2 if unsigned. * config/i386/i386-protos.h (ix86_expand_fp_spaceship): Add another rtx argument. (ix86_expand_int_spaceship): Declare. * config/i386/i386-expand.cc (ix86_expand_fp_spaceship): Add arg3 argument, if it