[PATCH] c++: Fix typo in RAW_DATA_CST build_list_conv subsubconv hanling [PR119563]
Hi! The following testcase ICEs (the embed one actually doesn't but dereferences random uninitialized pointer far after allocated memory) because of a typo. In the RAW_DATA_CST handling of list conversion where there are conversions to something other than initializer_list<{{,un}signed ,}char>, the code now calls implicit_conversion for all the RAW_DATA_CST elements and stores them into subsubconvs array. The next loop (done in a separate loop because subsubconvs[0] is handled differently) attempts to do the for (i = 0; i < len; ++i) { conversion *sub = subconvs[i]; if (sub->rank > t->rank) t->rank = sub->rank; if (sub->user_conv_p) t->user_conv_p = true; if (sub->bad_p) t->bad_p = true; } rank/user_conv_p/bad_p merging, but I mistyped the index, the loop iterates with j iterator and i is subconvs index, so the loop effectively doesn't do anything interesting except for merging from one of the subsubconvs element, if lucky within the subsubconvs array, if unlucky not even from inside of the array. The following patch fixes that. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2025-04-01 Andrew Pinski Jakub Jelinek PR c++/119563 * call.cc (build_list_conv): Fix a typo in loop gathering summary information from subsubconvs. * g++.dg/cpp0x/pr119563.C: New test. * g++.dg/cpp/embed-26.C: New test. --- gcc/cp/call.cc.jj 2025-03-06 11:08:20.657231238 +0100 +++ gcc/cp/call.cc 2025-04-01 11:25:14.030321320 +0200 @@ -917,7 +917,7 @@ build_list_conv (tree type, tree ctor, i t->rank = cr_exact; for (j = 0; j < (unsigned) RAW_DATA_LENGTH (val); ++j) { - sub = subsubconvs[i]; + sub = subsubconvs[j]; if (sub->rank > t->rank) t->rank = sub->rank; if (sub->user_conv_p) --- gcc/testsuite/g++.dg/cpp0x/pr119563.C.jj2025-04-01 11:14:24.702394749 +0200 +++ gcc/testsuite/g++.dg/cpp0x/pr119563.C 2025-04-01 11:13:50.962866358 +0200 @@ -0,0 +1,79 @@ +// PR c++/119563 +// { dg-do run { target c++11 } } +// { dg-options "-O2" } + +namespace std { +template +struct initializer_list { +private: + T *_M_array; + decltype (sizeof 0) _M_len; +public: + constexpr decltype (sizeof 0) + size () const noexcept { return _M_len; } + constexpr const T * + begin () const noexcept { return _M_array; } + constexpr const T * + end () const noexcept { return begin () + size (); } +}; +} + +struct A {} a; + +struct B { + constexpr B (int x) : B (a, x) {} + template + constexpr B (A, T... x) : b(x...) {} + int b; +}; + +struct C { + C (std::initializer_list x) + { +if (x.size () != 130 + 1024 + 130) + __builtin_abort (); +unsigned int i = 1, j = 0; +for (auto a = x.begin (); a < x.end (); ++a) + if (a->b != i) + __builtin_abort (); + else + { + if (j == 129 || j == 129 + 1024) + i = 0; + i = (i & 15) + 1; + ++j; + } +c = true; + } + bool c; +}; + +#define D 1 + 0, 2 + 0, 3 + 0, 4 + 0, 5 + 0, 6 + 0, 7 + 0, 8 + 0, \ + 9 + 0, 10 + 0, 11 + 0, 12 + 0, 13 + 0, 14 + 0, 15 + 0, 16 + 0 +#define E D, D, D, D, D, D, D, D + +C c { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, + 1, 2, E, E, E, E, E, E, E, E, + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, + 1, 2 }; + +int +main () +{ + if (!c.c) +__builtin_abort (); +} --- gcc/testsuite/g++.dg/cpp/embed-26.C.jj 2025-04-01 11:18:56.563595381 +0200 +++ gcc/testsuite/g++.dg/cpp/embed-26.C 2025-04-01 11:18:35.444890450 +0200 @@ -0,0 +1,63 @@ +// PR c++/119563 +// { dg-do run { target c++11 } } +// { dg-options "-O2" } + +namespace std { +template +struct initializer_list { +private: + T *_M_array; + decltype (sizeof 0) _M_len; +public: + constexpr decltype (sizeof 0) + size () const noexcept { return _M_len; } + constexpr const T * + begin () const noexcept { return _M_array; } + constexpr const T * + end () const noexcept { return begin () + size ();
Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language
> On Apr 1, 2025, at 10:04, Martin Uecker wrote: > > > > Am Montag, dem 31.03.2025 um 13:59 -0700 schrieb Bill Wendling: >>> I'd like to offer up this to solve the issues we're facing. This is a >>> combination of everything that's been discussed here (or at least that >>> I've been able to read in the centi-thread :-). > > Thanks! I think this proposal much better as it avoids undue burden > on parsers, but it does not address all my concerns. > > > From my side, the issue about compromising the scoping rules of C > is also about unintended non-local effects of code changes. In > my opinion, a change in a library header elsewhere should not cause > code in a local scope (which itself might also come from a macro) to > emit a warning or require a programmer to add a workaround. So I am > not convinced that adding warnings or a workaround such as > __builtin_global_ref is a good solution. > > > I could see the following as a possible way forward: We only > allow the following two syntaxes: > > 1. Single argument referring to a member. > > __counted_by(len) > > with an argument that must be a single identifier and where > the identifier then must refer to a struct member. > > (I still think this is not ideal and potentially > confusing, but in contrast to new scoping rules it is > at least relatively easily to explain as a special rule.). > > > 2. Forward declarations. > > __counted_by(size_t len; len + PADDING) In the above, the PADDING is some constant? More complicated expressions involving globals will not be supported? > > where then the second part can also be a more complicated > expression, but with the explicit requirement that all > identifiers in this expression are then looked up according to > regular C language rules. So except for the forward declared > member(s) they are *never* looked up in the member namespace of > the struct, i.e. no new name lookup rules are introduced. One question here: What’s the major issue if we’d like to add one new scoping rule, for example, “Structure scope” (the same as the C++’s instance scope) to C? (In addition to the "VLA in structure" issue I mentioned in my previous writeup, is there any other issue to prevent this new scoping rule being added into C ?). Qing > > > I think this could address my concerns about breaking > scoping in C. Still, I personally would prefer designator syntax > for both C and C++ as a nicer solution, and one that already > has some support from WG14. > > Martin > > >>> >>> --- >>> >>> 1. The use of '__self' isn't feasible, so we won't use it. Instead, >>> we'll rely upon the current behavior—resolving any identifiers to the >>> "instance scope". This new scope is used __only__ in attributes, and >>> resolves identifiers to those in the least enclosing, non-anonymous >>> struct. For example: >>> >>> struct foo { >>> char count; >>> struct bar { >>>struct { >>> int len; >>>}; >>>struct { >>> struct { >>>int *valid_use __counted_by(len); // Valid. >>> }; >>>}; >>>int *invalid_use __counted_by(count); // Invalid. >>> } b; >>> }; >>> >>> Rationale: This is how '__guarded_by' currently resolves identifiers, >>> so there's precedence. And if we can't force its usage in all >>> situations, it's less a feature and more a "nicety" which will lead to >>> a massive discrepancy between compiler implementations. Despite the >>> fact that this introduces a new scoping mechanism to C, its use is not >>> as extensive as C++'s instance scoping and will apply only to >>> attributes. In the case where we have two different resolution >>> techniquest happening within the same structure (e.g. VLAs), we can >>> issue warnings as outlined in Yeoul's RFC[1]. >>> >>> 2. A method of forward declaring variables will be added for variables >>> that occur in the struct after the attribute. For example: >>> >>> A: Necessary usage: >>> >>> struct foo { >>> int *buf __counted_by(char count; count); >>> char count; >>> }; >>> >>> B: Unnecessary, but still valid, usage: >>> >>> struct foo { >>> char count; >>> int *buf __counted_by(char count; count); >>> }; >>> >>> * The forward declaration is required in (A) but not in (B). >>> * The type of 'count' as declared in '__counted_by' *must* match the real >>> type. >>> >>> Rationale: This alleviates the issues of "double parsing" for >>> compilers that aren't able to handle it. (We can also remove the >>> '-fexperimental-late-parse-attributes' flag in Clang.) >>> >>> 3. A new builtin '__builtin_global_ref()' (or similarly named) is >>> added to refer to variables outside of the most-enclosing structure. >>> Example: >>> >>> int count_that_will_never_change_we_promise; >>> >>> struct foo { >>> int *bar >>> __counted_by(__builtin_global_ref(count_that_will_never_change_we_promise)); >>> unsigned flags; >>> }; >>> >>> As Yeoul pointed out, there isn't a way to refer to variables that >>> have been shadowed, so t
Re: [PATCH] gimple-low: Diagnose assume attr expressions defining labels which are used as unary && operands outside of those [PR119537]
On Tue, 1 Apr 2025, Jakub Jelinek wrote: > Hi! > > The following testcases ICE on invalid code which defines > labels inside of statement expressions and then uses &&label > from code outside of the statement expressions. > The C++ FE diagnoses that with a warning (not specifically for > assume attribute, genericallly about taking address of a label > outside of a statement expression so computed goto could violate > the requirement that statement expression is not entered from > outside of it through a jump into it), the C FE doesn't diagnose > anything. > Normal direct gotos to such labels are diagnosed by both C and C++. > In the assume attribute case it is actually worse than for > addresses of labels in normal statement expressions, in that case > the labels are still in the current function, so invalid program > can still jump to those (and in case of OpenMP/OpenACC where it > is also invalid and stuff is moved to a separate function, such > movement is done post cfg discovery of FORCED_LABELs and worst > case one can run into cases which fail to assemble, but I haven't > succeeded in creating ICE for that). > For assume at -O0 we'd just throw away the assume expression if > it is not a simple condition and so the label is then not defined > anywhere and we ICE during cfg pass. > The gimplify.cc hunks fix that, as we don't have FORCED_LABELs > discovery done yet, it preserves all assume expressions which contain > used user labels. > With that we ICE during IRA, which is upset about an indirect jump > to a label which doesn't exist. > So, the gimple-low.cc hunks add diagnostics of the problem, it gathers > uids of all the user used labels inside of the assume expressions (usually > none) and if it finds any, walks the IL to find uses of those from outside > of those expressions now outlined into separate magic functions. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. Thanks, Richard. > 2025-04-01 Jakub Jelinek > > PR middle-end/119537 > * gimplify.cc (find_used_user_labels): New function. > (gimplify_call_expr): Don't remove complex assume expression at -O0 > if it defines any user labels. > * gimple-low.cc: Include diagnostic-core.h. > (assume_labels): New variable. > (diagnose_assume_labels): New function. > (lower_function_body): Call it via walk_gimple_seq if assume_labels > is non-NULL, then BITMAP_FREE assume_labels. > (find_assumption_locals_r): Record in assume_labels uids of user > labels defined in assume attribute expressions. > > * c-c++-common/pr119537-1.c: New test. > * c-c++-common/pr119537-2.c: New test. > > --- gcc/gimplify.cc.jj2025-03-31 12:53:44.853727077 +0200 > +++ gcc/gimplify.cc 2025-03-31 17:05:40.854893880 +0200 > @@ -4508,6 +4508,21 @@ gimplify_variant_call_expr (tree expr, f > } > > > +/* Helper function for gimplify_call_expr, called via walk_tree. > + Find used user labels. */ > + > +static tree > +find_used_user_labels (tree *tp, int *, void *) > +{ > + if (TREE_CODE (*tp) == LABEL_EXPR > + && !DECL_ARTIFICIAL (LABEL_EXPR_LABEL (*tp)) > + && DECL_NAME (LABEL_EXPR_LABEL (*tp)) > + && TREE_USED (LABEL_EXPR_LABEL (*tp))) > +return *tp; > + return NULL_TREE; > +} > + > + > /* Gimplify the CALL_EXPR node *EXPR_P into the GIMPLE sequence PRE_P. > WANT_VALUE is true if the result of the call is desired. */ > > @@ -4568,8 +4583,14 @@ gimplify_call_expr (tree *expr_p, gimple > fndecl, 0)); > return GS_OK; > } > - /* If not optimizing, ignore the assumptions. */ > - if (!optimize || seen_error ()) > + /* If not optimizing, ignore the assumptions unless there > + are used user labels in it. */ > + if ((!optimize > +&& !walk_tree_without_duplicates (&CALL_EXPR_ARG (*expr_p, > + 0), > + find_used_user_labels, > + NULL)) > + || seen_error ()) > { > *expr_p = NULL_TREE; > return GS_ALL_DONE; > --- gcc/gimple-low.cc.jj 2025-03-21 20:25:38.396064280 +0100 > +++ gcc/gimple-low.cc 2025-03-31 20:20:38.961772695 +0200 > @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3. > #include "tree-inline.h" > #include "gimple-walk.h" > #include "attribs.h" > +#include "diagnostic-core.h" > > /* The differences between High GIMPLE and Low GIMPLE are the > following: > @@ -78,6 +79,10 @@ struct lower_data >bool cannot_fallthru; > }; > > +/* Bitmap of LABEL_DECL uids for user labels moved into assume outlined > + functions. */ > +static bitmap assume_labels; > + > static void lower_stmt (gimple_stmt_iterator *, struct lower_data *); > static
Re: [PATCH] RISC-V: xtheadmemidx: Split slli.uw pattern
On 3/30/25 8:54 PM, Bohan Lei wrote: The RTL pattern has an "and" operation, which clears out the upper bits after the shift operation. Since we have (INTVAL (operands[3]) >> INTVAL (operands[2])) == 0x as a constraint, the RTL template and the split code should be semantically identical. Also, the RTL template here is technically the same as that of "*slliuw" in bitmanip.md, and the split code shows the semantics of an slli.uw operation. You're absolutely right. Not sure how I missed that. I'll try to take another look at this tomorrow AM. It's just too late to try tonight. Thanks, jeff
Re: [PATCH][v2] target/119549 - fixup handling of -mno-sse4 in target attribute
On Tue, Apr 01, 2025 at 10:13:22AM +0200, Richard Biener wrote: > The following fixes ix86_valid_target_attribute_inner_p to properly > handle target("no-sse4") via OPT_mno_sse4 rather than as unset OPT_msse4. > I've added asserts to ix86_handle_option that RejectNegative is honored > for both. > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > OK? > > Thanks, > Richard. > > PR target/119549 > * common/config/i386/i386-common.cc (ix86_handle_option): > Assert that both OPT_msse4 and OPT_mno_sse4 are never unset. > * config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p): > Process negated OPT_msse4 as OPT_mno_sse4. > > * gcc.target/i386/pr119549.c: New testcase. LGTM. Jakub
[PATCH] c++/modules: Forbid exposures of TU-local entities in inline variables [PR119551]
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? -- >8 -- An inline variable has vague linkage, and needs to be conditionally emitted in TUs that reference it. Unfortunately this clashes with [basic.link] p14.2, which says that we ignore the initialisers of all variables (including inline ones), since importers will not have access to the referenced TU-local entities to write the definition. This patch makes such exposures be ill-formed. One case that continues to work is if the exposure is part of the dynamic initialiser of an inline variable; in such cases, the definition has been built as part of the module interface unit anyway, and importers don't need to write it out again, so such exposures are "harmless". PR c++/119551 gcc/cp/ChangeLog: * module.cc (trees_out::write_var_def): Only ignore non-inline variable initializers. gcc/testsuite/ChangeLog: * g++.dg/modules/internal-5_a.C: Add cases that should be ignored. * g++.dg/modules/internal-5_b.C: Test these new cases, and make the testcase more robust. * g++.dg/modules/internal-11.C: New test. * g++.dg/modules/internal-12_a.C: New test. * g++.dg/modules/internal-12_b.C: New test. Signed-off-by: Nathaniel Shead --- gcc/cp/module.cc | 7 +++--- gcc/testsuite/g++.dg/modules/internal-11.C | 24 gcc/testsuite/g++.dg/modules/internal-12_a.C | 13 +++ gcc/testsuite/g++.dg/modules/internal-12_b.C | 14 gcc/testsuite/g++.dg/modules/internal-5_a.C | 8 ++- gcc/testsuite/g++.dg/modules/internal-5_b.C | 6 + 6 files changed, 68 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/g++.dg/modules/internal-11.C create mode 100644 gcc/testsuite/g++.dg/modules/internal-12_a.C create mode 100644 gcc/testsuite/g++.dg/modules/internal-12_b.C diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index 894c70f7225..ce22b2ece3f 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -12679,9 +12679,10 @@ trees_in::read_function_def (tree decl, tree maybe_template) void trees_out::write_var_def (tree decl) { - /* The initializer of a variable or variable template is ignored for - determining exposures. */ - auto ovr = make_temp_override (dep_hash->ignore_tu_local, VAR_P (decl)); + /* The initializer of a non-inline variable or variable template is + ignored for determining exposures. */ + auto ovr = make_temp_override (dep_hash->ignore_tu_local, +VAR_P (decl) && !DECL_INLINE_VAR_P (decl)); tree init = DECL_INITIAL (decl); tree_node (init); diff --git a/gcc/testsuite/g++.dg/modules/internal-11.C b/gcc/testsuite/g++.dg/modules/internal-11.C new file mode 100644 index 000..53eb30a62be --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/internal-11.C @@ -0,0 +1,24 @@ +// PR c++/119551 +// { dg-additional-options "-fmodules" } +// { dg-module-cmi !M } + +export module M; + +static int tu_local = 5; +static int* foo() { return &tu_local; } + +// For implementation reasons, we adjust [basic.link] p14.2 to restrict ignored +// exposures to non-inline variables, since for inline variables without +// dynamic initialisation we need to emit their initialiser for importer use. + +int* a = &tu_local; // OK +inline int* b = &tu_local; // { dg-error "exposes TU-local entity" } + +// But dynamic initialisers are fine, importers will just treat them as external. +inline int* c = foo(); // OK + +// For consistency, we follow the same rules with templates, noting that +// we still need to emit definitions with dynamic initializers so we error. +template int* d = &tu_local; // OK +template inline int* e = &tu_local; // { dg-error "exposes TU-local entity" } +template inline int* f = foo(); // { dg-error "exposes TU-local entity" } diff --git a/gcc/testsuite/g++.dg/modules/internal-12_a.C b/gcc/testsuite/g++.dg/modules/internal-12_a.C new file mode 100644 index 000..5c4e7c602b0 --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/internal-12_a.C @@ -0,0 +1,13 @@ +// PR c++/119551 +// { dg-additional-options "-fmodules" } +// { dg-module-cmi M } +// Test that emitting variables referencing TU-local entities +// builds and runs correctly. + +export module M; + +static int tu_local_var = 5; +static int* tu_local_func() { return &tu_local_var; } + +export int* a = &tu_local_var; +export inline int* b = tu_local_func(); diff --git a/gcc/testsuite/g++.dg/modules/internal-12_b.C b/gcc/testsuite/g++.dg/modules/internal-12_b.C new file mode 100644 index 000..bc3edf99a11 --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/internal-12_b.C @@ -0,0 +1,14 @@ +// PR c++/119551 +// { dg-module-do run } +// { dg-additional-options "-fmodules" } + +import M; + +int main() { + if (*a != 5) +__builtin_abort(); + if (*b != 5) +__builtin_abort(); + if (a != b) +__builtin_abort(); +} diff --git a/gcc/testsuite/g++.dg/m
Re: [PATCH] tailc, v2: Improve tail recursion handling [PR119493]
On Tue, 1 Apr 2025, Jakub Jelinek wrote: > On Tue, Apr 01, 2025 at 10:59:36AM +0200, Jakub Jelinek wrote: > > On Tue, Apr 01, 2025 at 10:46:15AM +0200, Richard Biener wrote: > > > This looks OK, but I wonder if ... > > > > - /* The parameter should be a real operand, so that phi > > > > node > > > > -created for it at the start of the function has the > > > > meaning > > > > -of copying the value. This test implies > > > > is_gimple_reg_type > > > > -from the previous condition, however this one could be > > > > -relaxed by being more careful with copying the new > > > > value > > > > -of the parameter (emitting appropriate GIMPLE_ASSIGN > > > > and > > > > -updating the virtual operands). */ > > > > - if (!is_gimple_reg (param)) > > > > + if (is_gimple_reg_type (TREE_TYPE (param)) > > > > + ? !is_gimple_reg (param) > > > > > > ... we want to restrict this to musttail calls at this point and > > > relax for stage1 only? > > > > I can do that, sure. > > Here it is, ok if it passes bootstrap/regtest? I'll queue the interdiff > between this patch and the previous one for GCC 16. OK. Thanks, Richard. > 2025-04-01 Jakub Jelinek > > PR tree-optimization/119493 > * tree-tailcall.cc (find_tail_calls): Don't punt on tail recusion > if some arguments don't have is_gimple_reg_type, only punt if they > have non-POD types, or volatile, or addressable or (for now) it is > not a musttail call. Set tailr_arg_needs_copy in those cases too. > (eliminate_tail_call): Copy call arguments to params if they don't > have is_gimple_reg_type, use temporaries if the argument is used > later. > (tree_optimize_tail_calls_1): Skip !is_gimple_reg_type > tailr_arg_needs_copy parameters. Formatting fix. > > * gcc.dg/pr119493-1.c: New test. > > --- gcc/tree-tailcall.cc.jj 2025-03-28 10:49:30.0 +0100 > +++ gcc/tree-tailcall.cc 2025-04-01 11:47:31.417637335 +0200 > @@ -676,19 +676,17 @@ find_tail_calls (basic_block bb, struct >have a copyable type and the two arguments must have reasonably >equivalent types. The latter requirement could be relaxed if >we emitted a suitable type conversion statement. */ > - if (!is_gimple_reg_type (TREE_TYPE (param)) > + if (TREE_ADDRESSABLE (TREE_TYPE (param)) > || !useless_type_conversion_p (TREE_TYPE (param), >TREE_TYPE (arg))) > break; > > - /* The parameter should be a real operand, so that phi node > - created for it at the start of the function has the meaning > - of copying the value. This test implies is_gimple_reg_type > - from the previous condition, however this one could be > - relaxed by being more careful with copying the new value > - of the parameter (emitting appropriate GIMPLE_ASSIGN and > - updating the virtual operands). */ > - if (!is_gimple_reg (param)) > + if (is_gimple_reg_type (TREE_TYPE (param)) > + ? !is_gimple_reg (param) > + : (!is_gimple_variable (param) > + || TREE_THIS_VOLATILE (param) > + || may_be_aliased (param) > + || !gimple_call_must_tail_p (call))) > break; > } > } > @@ -938,9 +936,9 @@ find_tail_calls (basic_block bb, struct > param = DECL_CHAIN (param), idx++) > { > tree ddef, arg = gimple_call_arg (call, idx); > - if (is_gimple_reg (param) > - && (ddef = ssa_default_def (cfun, param)) > - && (arg != ddef)) > + if (!is_gimple_reg (param) > + || ((ddef = ssa_default_def (cfun, param)) > + && arg != ddef)) > bitmap_set_bit (tailr_arg_needs_copy, idx); > } > } > @@ -1216,6 +1214,7 @@ eliminate_tail_call (struct tailcall *t, > >/* Add phi node entries for arguments. The ordering of the phi nodes > should > be the same as the ordering of the arguments. */ > + auto_vec copies; >for (param = DECL_ARGUMENTS (current_function_decl), >idx = 0, gpi = gsi_start_phis (first); > param; > @@ -1224,6 +1223,35 @@ eliminate_tail_call (struct tailcall *t, >if (!bitmap_bit_p (tailr_arg_needs_copy, idx)) > continue; > > + if (!is_gimple_reg_type (TREE_TYPE (param))) > + { > + if (param == gimple_call_arg (stmt, idx)) > + continue; > + /* First check if param isn't used by any of the following > + call arguments. If it is, we need to copy first to > + a temporary and only after doing all the assignments copy it > + to param. */ > + size_t idx2 = idx + 1; > + tree
Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language
Hello, On Mon, 31 Mar 2025, Bill Wendling wrote: > 1. The use of '__self' isn't feasible, so we won't use it. That's a bold statement. How's that? The only thing I read here is that the very spelling of "self" was objected to. So, call it _Self, _Selfref, or something. Even _Whatever42 would be better. Lookup rule changes only for identifiers used in attributes (all of them? some of them?) is ... well, not good language design? I think you used the word "overreaching" upthread, correctly so. And that for wanting to avoid the established member-lookup syntax A.B by way of "we won't use it"? That seems a terrible tradeoff. I don't understand the push for that approach, at all. Ciao, Michael.
Re: [PATCH] combine: Use reg_used_between_p rather than modified_between_p in two spots [PR119291]
On Tue, 1 Apr 2025, Jakub Jelinek wrote: > On Tue, Apr 01, 2025 at 11:27:25AM +0200, Richard Biener wrote: > > > Looking at this some more today, I think we should special case > > > set_noop_p because that can be put into i2 (except for the JUMP_P > > > violations), currently both modified_between_p (pc_rtx, i2, i3) > > > and reg_used_between_p (pc_rtx, i2, i3) returns false. > > > I'll post a patch incrementally for that (but that feels like > > > new optimization, so probably not something that should be backported). > > > > Trying to review this I noticed that both the comment in combine suggests > > that memory set is important and modified_between_p handles MEM_P > > 'x' and for non-REG 'x' possibly MEM operands (CONCAT?) while > > reg_used_between_p only handles REG_P 'reg'. Also shouldn't > > you use reg_referenced_p? At least that function seems to handle > > SET_SRC/SET_DEST separately? > > > > Can we constrain SET_DEST (set1/set0) to a REG_P in combine? Why > > does the comment talk about memory? > > I was worried about making too risky changes this late in stage4 > (and especially also for backports). Most of this code is 1992-ish. > I think many of the functions are just misnamed, the reg_ in there doesn't > match what those functions do (bet they initially supported just REGs > and later on support for other kinds of expressions was added, but haven't > done git archeology to prove that). > > What we know for sure is: >&& GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != ZERO_EXTRACT >&& GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != STRICT_LOW_PART >&& GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != ZERO_EXTRACT >&& GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != STRICT_LOW_PART > that is checked earlier in the condition. > Then it calls >&& ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)), > XVECEXP (newpat, 0, 0)) >&& ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)), > XVECEXP (newpat, 0, 1)) > While it has reg_* in it, that function mostly calls reg_overlap_mentioned_p > which is also misnamed, that function handles just fine all of > REG, MEM, SUBREG of REG, (SUBREG of MEM not, see below), ZERO_EXTRACT, > STRICT_LOW_PART, PC and even some further cases. > So, IMHO SET_DEST (set0) or SET_DEST (set0) can be certainly a REG, SUBREG > of REG, PC (at least the REG and PC cases are triggered on the testcase) > and quite possibly also MEM (SUBREG of MEM not, see below). > > Now, the code uses !modified_between_p (SET_SRC (set{1,0}), i2, i3) where that > function for constants just returns false, for PC returns true, for REG > returns reg_set_between_p, for MEM recurses on the address, for > MEM_READONLY_P otherwise returns false, otherwise checks using alias.cc code > whether the memory could have been modified in between, for all other > rtxes recurses on the subrtxes. This part didn't change in my patch. > > I've only changed those > - && !modified_between_p (SET_DEST (set{1,0}), i2, i3) > + && !reg_used_between_p (SET_DEST (set{1,0}), i2, i3) > where the former has been described above and clearly handles all of > REG, SUBREG of REG, PC, MEM and SUBREG of MEM among other things. > > The replacement reg_used_between_p calls reg_overlap_mentioned_p on each > instruction in between i2 and i3. So, there is clearly a difference > in behavior if SET_DEST (set{1,0}) is pc_rtx, in that case modified_between_p > returns unconditionally true even if there are no instructions in between, > but reg_used_between_p if there are no non-debug insns in between returns > false. Sorry for missing that, guess I should check for that (with the > exception of the noop moves which are often (set (pc) (pc)) and handled > by the incremental patch). In fact not just that, reg_used_between_p > will only return true for PC if it is mentioned anywhere in the insns > in between. > Anyway, except for that, for REG it calls refers_to_regno_p > and so should find any occurrences of any of the REG or parts of it for hard > registers, for MEM returns true if it sees any MEMs in insns in between > (conservatively), for SUBREGs apparently it relies on it being SUBREG of REG > (so doesn't handle SUBREG of MEM) and handles SUBREG of REG like the > SUBREG_REG, PC I've already described. > > Now, because reg_overlap_mentioned_p doesn't handle SUBREG of MEM, I think > already the initial >&& ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)), > XVECEXP (newpat, 0, 0)) >&& ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)), > XVECEXP (newpat, 0, 1)) > calls would have failed --enable-checking=rtl or would have misbehaved, so > I think there is no need to check for it further. > > To your question why I don't use reg_referenced_p, that is because > reg_refere
Re: [PATCH] Libstdc++: Fix bootstrap failure for cross without tm.tm_zone [PR119550]
On Tue, 1 Apr 2025 at 11:34, Tomasz Kaminski wrote: > > > > On Mon, Mar 31, 2025 at 7:28 PM Jonathan Wakely wrote: >> >> In r15-8491-g778c28c70f8573 I added a use of the Autoconf macro >> AC_STRUCT_TIMEZONE, but that requires a link-test for the global tzname >> object if tm.tm_zone isn't supported. That link-test isn't allowed for >> cross-compilation, so bootstrap fails if tm.tm_zone isn't supported. >> >> Since libstdc++ only cares about tm.tm_zone and won't use tzname anyway, >> we don't need the link-test. Replace AC_STRUCT_TIMEZONE with a custom >> macro that only checks for tm.tm_zone. We can improve on the Autoconf >> macro by checking it's a suitable type, which isn't actually checked by >> AC_STRUCT_TIMEZONE. >> >> libstdc++-v3/ChangeLog: >> >> PR libstdc++/119550 >> * acinclude.m4 (GLIBCXX_STRUCT_TM_TM_ZONE): New macro. >> * config.h.in: Regenerate. >> * configure: Regenerate. >> * configure.ac: Use GLIBCXX_STRUCT_TM_TM_ZONE. >> * include/bits/chrono_io.h (__formatter_chrono::_M_c): Check >> _GLIBCXX_USE_STRUCT_TM_TM_ZONE instead of >> _GLIBCXX_HAVE_STRUCT_TM_TM_ZONE. >> --- >> >> Testing x86_64-linux and sparc-solaris2.11, looks fine so far. >> >> libstdc++-v3/acinclude.m4 | 35 >> libstdc++-v3/config.h.in | 21 +-- >> libstdc++-v3/configure| 238 -- >> libstdc++-v3/configure.ac | 5 +- >> libstdc++-v3/include/bits/chrono_io.h | 2 +- >> 5 files changed, 109 insertions(+), 192 deletions(-) >> >> diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4 >> index e668d2dba27..02fd349e11d 100644 >> --- a/libstdc++-v3/acinclude.m4 >> +++ b/libstdc++-v3/acinclude.m4 >> @@ -5744,6 +5744,41 @@ AC_DEFUN([GLIBCXX_ZONEINFO_DIR], [ >>fi >> ]) >> >> +dnl >> +dnl Check for a tm_zone member in struct tm. >> +dnl >> +dnl This member is defined as const char* in Glibc, newlib, POSIX.1-2024, >> +dnl and as char* in BSD (including macOS). >> +dnl >> +dnl Defines: >> +dnl _GLIBCXX_USE_STRUCT_TM_TM_ZONE if struct tm has a tm_zone member. >> +dnl >> +AC_DEFUN([GLIBCXX_STRUCT_TM_TM_ZONE], [ >> + >> + AC_LANG_SAVE > > From documentation this is deprecated in favor of AC_LANG_PUSH. It looks like that is supported by autoconf 2.69 and is already used elsewhere in GCC. We don't currently use it in libstdc++ but it should be safe to do so. >> + AC_LANG_CPLUSPLUS >> + ac_save_CXXFLAGS="$CXXFLAGS" >> + CXXFLAGS="$CXXFLAGS -std=c++20" > > The program that is compiled does not seem to require C++20. > If we change the declaration of "t" to use "= {}", we could do it in C++98. > Any reason to adjust the flags at all? We only need to use the tm_zone member in C++20 mode, and it's possible that on some platform it's not exposed for older standards (very unlikely, but possible). I wanted to test for exactly what we require. I don't feel strongly about it though. >> >> + >> + AC_CACHE_CHECK([for tm_zone member of struct tm], glibcxx_cv_tm_zone, [ >> +AC_COMPILE_IFELSE([AC_LANG_PROGRAM([#include >> + ], >> + [struct tm t{}; t.tm_zone = (char*)0;] >> + )], >> + [glibcxx_cv_tm_zone=yes], >> + [glibcxx_cv_tm_zone=no] >> + ) >> +]) >> + >> + if test $glibcxx_cv_tm_zone = yes; then >> +AC_DEFINE(_GLIBCXX_USE_STRUCT_TM_TM_ZONE, 1, >> + [Define if struct tm has a tm_zone member.]) >> + fi >> + >> + CXXFLAGS="$ac_save_CXXFLAGS" >> + AC_LANG_RESTORE >> +]) >> + >> dnl >> dnl Check whether lock tables can be aligned to avoid false sharing. >> dnl >> diff --git a/libstdc++-v3/config.h.in b/libstdc++-v3/config.h.in >> index be151f43dd6..77bbaf1beaa 100644 >> --- a/libstdc++-v3/config.h.in >> +++ b/libstdc++-v3/config.h.in >> @@ -74,10 +74,6 @@ >> don't. */ >> #undef HAVE_DECL_STRNLEN >> >> -/* Define to 1 if you have the declaration of `tzname', and to 0 if you >> don't. >> - */ >> -#undef HAVE_DECL_TZNAME >> - >> /* Define to 1 if you have the header file. */ >> #undef HAVE_DIRENT_H >> >> @@ -412,9 +408,6 @@ >> /* Define to 1 if `d_type' is a member of `struct dirent'. */ >> #undef HAVE_STRUCT_DIRENT_D_TYPE >> >> -/* Define to 1 if `tm_zone' is a member of `struct tm'. */ >> -#undef HAVE_STRUCT_TM_TM_ZONE >> - >> /* Define if strxfrm_l is available in . */ >> #undef HAVE_STRXFRM_L >> >> @@ -506,17 +499,9 @@ >> /* Define to 1 if the target supports thread-local storage. */ >> #undef HAVE_TLS >> >> -/* Define to 1 if your `struct tm' has `tm_zone'. Deprecated, use >> - `HAVE_STRUCT_TM_TM_ZONE' instead. */ >> -#undef HAVE_TM_ZONE >> - >> /* Define if truncate is available in . */ >> #undef HAVE_TRUNCATE >> >> -/* Define to 1 if you don't have `tm_zone' but do have the external array >> - `tzname'. */ >> -#undef HAVE_TZNAME >> - >> /* Define to 1 if you have the header file. */ >> #undef HAVE_UCHAR_H >> >> @@ -605,9 +590,6 @@ >> /* Define to 1 if you have the ANSI C header fil
Re: [PATCH v2 0/4] Libsanitizer improvements
On Mon, Mar 17, 2025 at 2:24 PM Aleksandar Rakic wrote: > > This patch series improves the libsanitizer for the mips target > in GCC. You should send these improvements to upstream libsanitizer instead. > These patches are cherry-picked from the mips_rel/11_2_0/master > and mips_rel/9_3_0/master branches from the MIPS' repository: > https://github.com/MIPS/gcc . > Further details on the individual changes are included in the > respective patches.
[PATCH] tailr: Punt on tail recursions that would break musttail [PR119493]
Hi! While working on the previous tailc patch, I've noticed the following problem. The testcase below fails, because we decide to tail recursion optimize the call, but tail recursion (as documented in tree-tailcall.cc) needs to add some result multiplication and/or addition if any tail recursion uses accumulator, which is added right before the return. So, if there are musttail non-recurive calls in the function, successful tail recursion optimization will mean we'll later error on the musttail calls. musttail recursive calls are ok, those would be tail recursion optimized. So, the following patch punts on all tail recursion optimizations if it needs accumulators (add and/or mult) if there is at least one non-recursive musttail call. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2025-04-01 Jakub Jelinek PR tree-optimization/119493 * tree-tailcall.cc (tree_optimize_tail_calls_1): Ignore tail recursion candidates which need accumulators if there is at least one musttail non-recursive call. * gcc.dg/pr119493-2.c: New test. --- gcc/tree-tailcall.cc.jj 2025-03-31 13:00:58.288725496 +0200 +++ gcc/tree-tailcall.cc2025-03-31 14:34:31.285106333 +0200 @@ -1373,6 +1373,38 @@ tree_optimize_tail_calls_1 (bool opt_tai live_vars = NULL; } + if (cfun->has_musttail) +{ + /* We can't mix non-recursive must tail calls with tail recursive +calls which require accumulators, because in that case we have to +emit code in between the musttail calls and return, which prevent +calling them as tail calls. So, in that case give up on the +tail recursion. */ + for (act = tailcalls; act; act = act->next) + if (!act->tail_recursion) + { + gcall *call = as_a (gsi_stmt (act->call_gsi)); + if (gimple_call_must_tail_p (call)) + break; + } + if (act) + for (struct tailcall **p = &tailcalls; *p; ) + { + if ((*p)->tail_recursion && ((*p)->add || (*p)->mult)) + { + struct tailcall *a = *p; + *p = (*p)->next; + gcall *call = as_a (gsi_stmt (a->call_gsi)); + maybe_error_musttail (call, + _("tail recursion with accumulation " + "mixed with musttail " + "non-recursive call"), diag_musttail); + free (a); + } + else + p = &(*p)->next; + } +} /* Construct the phi nodes and accumulators if necessary. */ a_acc = m_acc = NULL_TREE; for (act = tailcalls; act; act = act->next) --- gcc/testsuite/gcc.dg/pr119493-2.c.jj2025-03-31 14:41:18.977464154 +0200 +++ gcc/testsuite/gcc.dg/pr119493-2.c 2025-03-31 14:45:24.327068682 +0200 @@ -0,0 +1,22 @@ +/* PR tree-optimization/119493 */ +/* { dg-do compile { target musttail } } */ +/* { dg-options "-O2 -fdump-tree-tailr1-details" } */ +/* { dg-final { scan-tree-dump-times "tail recursion with accumulation mixed with musttail non-recursive call" 2 "tailr1" } } */ + +[[gnu::noipa]] int +bar (int x, int y) +{ + return x + y; +} + +[[gnu::noinline, gnu::noclone]] int +foo (int x, int y) +{ + if (x < 10) +[[gnu::musttail]] return bar (x, y); + if (y & 2) +return foo (x - 1, y) * 2; + if (y & 1) +[[gnu::musttail]] return foo (x - 1, y); + return foo (x - 1, y) * 3; +} Jakub
Re: [PATCH] phiprop: Avoid proping loads into loops [PR116835]
On Tue, Apr 1, 2025 at 6:10 AM Andrew Pinski wrote: > > phiprop can sometimes prop loads back into loops > and in some cases cause wrong code when the load > was from a weak symbol as now it becomes an unconditional > load before the loop. > > Bootstrapped and tested on x86_64-linux-gnu with no regressions. > > PR tree-optimization/116835 > > gcc/ChangeLog: > > * tree-ssa-phiprop.cc (propagate_with_phi): Check > the use is at the same or deeper loop depth than > the phi node. > (pass_phiprop::execute): Initialize loops. > > gcc/testsuite/ChangeLog: > > * gcc.dg/torture/pr116835.c: New test. > > Signed-off-by: Andrew Pinski > --- > gcc/testsuite/gcc.dg/torture/pr116835.c | 33 + > gcc/tree-ssa-phiprop.cc | 14 ++- > 2 files changed, 46 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.dg/torture/pr116835.c > > diff --git a/gcc/testsuite/gcc.dg/torture/pr116835.c > b/gcc/testsuite/gcc.dg/torture/pr116835.c > new file mode 100644 > index 000..31d3b59d945 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/torture/pr116835.c > @@ -0,0 +1,33 @@ > +/* { dg-do run { target { weak_undefined } } } */ > +/* { dg-add-options weak_undefined } */ > + > +/* PR tree-optimization/116835 */ > +/* phiprop would prop into the loop the load of b > + and also prop the load of a before the loop. > + Which is incorrect as a is a weak symbol. */ > + > +struct s1 > +{ > + int t; > + int t1; > +}; > +typedef struct s1 type; > +extern type a __attribute__((weak)); > +int t; > +type b; > +type bar (int c) __attribute__((noipa, noinline)); > +type > +bar (int c) > +{ > + t = 1; > + type *p = &a; > + for (int j = 0; j < c; ++j) > +p = &b; > + return *p; > +} > + > +int main(void) > +{ > + if (bar(&a == nullptr).t) > + __builtin_abort(); > +} > diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc > index a2e1fb16a30..c5fe231d3e9 100644 > --- a/gcc/tree-ssa-phiprop.cc > +++ b/gcc/tree-ssa-phiprop.cc > @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3. If not see > #include "tree-ssa-loop.h" > #include "tree-cfg.h" > #include "tree-ssa-dce.h" > +#include "cfgloop.h" > > /* This pass propagates indirect loads through the PHI node for its > address to make the load source possibly non-addressable and to > @@ -348,11 +349,17 @@ propagate_with_phi (basic_block bb, gphi *phi, struct > phiprop_d *phivn, > calculate_dominance_info (CDI_POST_DOMINATORS); > >/* Only replace loads in blocks that post-dominate the PHI node. That > - makes sure we don't end up speculating loads. */ > +makes sure we don't end up speculating loads. */ >if (!dominated_by_p (CDI_POST_DOMINATORS, >bb, gimple_bb (use_stmt))) > continue; I think the testcase shows the above test is not sufficient to avoid doing invalid speculation. We need to verify the assumption that the PHI is dereferenced each time it is "executed". With cycles SSA form and post-dominance are not helpful here. I think we have to amend the post-dominance check to deal with SSA cycles, but your check does not look fully correct (and its comment is misleading). We need to constrain the load to be in the same or a subloop (use flow_loop_nested_p, not loop depth) or in the same BB when either the load or the PHI is in an irreducible region. > > + /* Only replace loads in blocks are in the same loop > +are inside an deeper loop. This is to make sure not > +to prop back into the loop. */ > + if (bb_loop_depth (gimple_bb (use_stmt)) < bb_loop_depth (bb)) > + continue; > + So something like /* Amend the post-dominance check for SSA cycles, we need to make sure each PHI result value is dereferenced. */ if (!(gimple_bb (use_stmt) == bb || (!(bb->flags & BB_IRREDUCIBLE_LOOP) && !(gimple_bb (use_stmt)->flags & BB_IRREDUCIBLE_LOOP) && (bb->loop_father == gimple_bb (use_stmt)->loop_father || flow_loop_nested_p (bb->loop_father, gimple_bb (use_stmt)->loop_father) continue; >/* Check whether this is a load of *ptr. */ >if (!(is_gimple_assign (use_stmt) > && gimple_assign_rhs_code (use_stmt) == MEM_REF > @@ -520,6 +527,10 @@ pass_phiprop::execute (function *fun) >size_t n; >auto_bitmap dce_ssa_names; > > + /* Find the loops, so that we can prevent moving loads in > + them. */ > + loop_optimizer_init (AVOID_CFG_MODIFICATIONS); > + >calculate_dominance_info (CDI_DOMINATORS); > >n = num_ssa_names; > @@ -548,6 +559,7 @@ pass_phiprop::execute (function *fun) >free (phivn); > >free_dominance_info (CDI_POST_DOMINATORS); > + loop_optimizer_finalize (); > >return did_something ? TODO_update_ssa_only_virtuals : 0; > } > -- > 2.43.0 >
[PATCH][v2] target/119549 - fixup handling of -mno-sse4 in target attribute
The following fixes ix86_valid_target_attribute_inner_p to properly handle target("no-sse4") via OPT_mno_sse4 rather than as unset OPT_msse4. I've added asserts to ix86_handle_option that RejectNegative is honored for both. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK? Thanks, Richard. PR target/119549 * common/config/i386/i386-common.cc (ix86_handle_option): Assert that both OPT_msse4 and OPT_mno_sse4 are never unset. * config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p): Process negated OPT_msse4 as OPT_mno_sse4. * gcc.target/i386/pr119549.c: New testcase. --- gcc/common/config/i386/i386-common.cc| 2 ++ gcc/config/i386/i386-options.cc | 7 +++ gcc/testsuite/gcc.target/i386/pr119549.c | 15 +++ 3 files changed, 24 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr119549.c diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc index 80aec3244bc..4815fbc4d35 100644 --- a/gcc/common/config/i386/i386-common.cc +++ b/gcc/common/config/i386/i386-common.cc @@ -1519,11 +1519,13 @@ ix86_handle_option (struct gcc_options *opts, return true; case OPT_msse4: + gcc_assert (value != 0); opts->x_ix86_isa_flags |= OPTION_MASK_ISA_SSE4_SET; opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_SSE4_SET; return true; case OPT_mno_sse4: + gcc_assert (value != 0); opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_SSE4_UNSET; opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_SSE4_UNSET; opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_SSE4_UNSET; diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc index 91f59e16fd5..a9fac011f3d 100644 --- a/gcc/config/i386/i386-options.cc +++ b/gcc/config/i386/i386-options.cc @@ -1271,6 +1271,13 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[], } } + /* Fixup -msse4 which is RejectNegative to -mno-sse4 when negated. */ + if (opt == OPT_msse4 && !opt_set_p) + { + opt = OPT_mno_sse4; + opt_set_p = true; + } + /* Process the option. */ if (opt == N_OPTS) { diff --git a/gcc/testsuite/gcc.target/i386/pr119549.c b/gcc/testsuite/gcc.target/i386/pr119549.c new file mode 100644 index 000..a465bec3cf0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr119549.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-msse4" } */ + +typedef long long v2di __attribute__((vector_size(16))); + +static inline __attribute__((always_inline)) +int rte_trace_feature_is_enabled() { return 1; } /* { dg-error "inlining failed" } */ + +void __attribute__((target ("no-sse3"))) __attribute__((target ("no-sse4"))) +rte_eal_trace_generic_void_init(void) +{ + if (!rte_trace_feature_is_enabled()) return; + __asm__ volatile ("" : : : "memory"); +} + -- 2.43.0
Re: [PATCH] tailc: Improve tail recursion handling [PR119493]
On Tue, Apr 01, 2025 at 10:46:15AM +0200, Richard Biener wrote: > This looks OK, but I wonder if ... > > - /* The parameter should be a real operand, so that phi node > > -created for it at the start of the function has the meaning > > -of copying the value. This test implies is_gimple_reg_type > > -from the previous condition, however this one could be > > -relaxed by being more careful with copying the new value > > -of the parameter (emitting appropriate GIMPLE_ASSIGN and > > -updating the virtual operands). */ > > - if (!is_gimple_reg (param)) > > + if (is_gimple_reg_type (TREE_TYPE (param)) > > + ? !is_gimple_reg (param) > > ... we want to restrict this to musttail calls at this point and > relax for stage1 only? I can do that, sure. Jakub
Re: [PATCH] tailr: Punt on tail recursions that would break musttail [PR119493]
On Tue, Apr 01, 2025 at 10:50:01AM +0200, Richard Biener wrote: > > While working on the previous tailc patch, I've noticed the following > > problem. > > The testcase below fails, because we decide to tail recursion optimize > > the call, but tail recursion (as documented in tree-tailcall.cc) needs to > > add some result multiplication and/or addition if any tail recursion uses > > accumulator, which is added right before the return. > > So, if there are musttail non-recurive calls in the function, successful > > tail recursion optimization will mean we'll later error on the musttail > > calls. musttail recursive calls are ok, those would be tail recursion > > optimized. > > > > So, the following patch punts on all tail recursion optimizations if it > > needs accumulators (add and/or mult) if there is at least one non-recursive > > musttail call. > > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > OK. > > Do we update cfun->has_musttail at some point for optimized away > musttail calls with respect to inlining/DCE? I do wonder whether I think we don't update it at all. During inlining that is not a problem, because we clear musttail in functions being inlined unless the call being inlined was musttail too and in that case the caller should have cfun->has_musttail already. And I think it isn't a big deal when it might be pessimistically set even when the musttail calls are DCEd, in tree-tailcall.cc we can perform some extra work if cfun->has_musttail is set, and in case of -O0/-Og even that means running an entirely new pass, but it shouldn't be that expensive and if we don't actually find any musttail calls, it behaves pretty much the same as if cfun->has_musttail wasn't set (there are some cases where with !cfun->has_musttail we punt earlier than with it, but if there are no musttail calls, it shouldn't make a difference in the end except for perhaps different details in -fdump-tree-tail{r,c}-details dump. Jakub
[PATCH] tailc, v2: Improve tail recursion handling [PR119493]
On Tue, Apr 01, 2025 at 10:59:36AM +0200, Jakub Jelinek wrote: > On Tue, Apr 01, 2025 at 10:46:15AM +0200, Richard Biener wrote: > > This looks OK, but I wonder if ... > > > - /* The parameter should be a real operand, so that phi node > > > - created for it at the start of the function has the meaning > > > - of copying the value. This test implies is_gimple_reg_type > > > - from the previous condition, however this one could be > > > - relaxed by being more careful with copying the new value > > > - of the parameter (emitting appropriate GIMPLE_ASSIGN and > > > - updating the virtual operands). */ > > > - if (!is_gimple_reg (param)) > > > + if (is_gimple_reg_type (TREE_TYPE (param)) > > > + ? !is_gimple_reg (param) > > > > ... we want to restrict this to musttail calls at this point and > > relax for stage1 only? > > I can do that, sure. Here it is, ok if it passes bootstrap/regtest? I'll queue the interdiff between this patch and the previous one for GCC 16. 2025-04-01 Jakub Jelinek PR tree-optimization/119493 * tree-tailcall.cc (find_tail_calls): Don't punt on tail recusion if some arguments don't have is_gimple_reg_type, only punt if they have non-POD types, or volatile, or addressable or (for now) it is not a musttail call. Set tailr_arg_needs_copy in those cases too. (eliminate_tail_call): Copy call arguments to params if they don't have is_gimple_reg_type, use temporaries if the argument is used later. (tree_optimize_tail_calls_1): Skip !is_gimple_reg_type tailr_arg_needs_copy parameters. Formatting fix. * gcc.dg/pr119493-1.c: New test. --- gcc/tree-tailcall.cc.jj 2025-03-28 10:49:30.0 +0100 +++ gcc/tree-tailcall.cc2025-04-01 11:47:31.417637335 +0200 @@ -676,19 +676,17 @@ find_tail_calls (basic_block bb, struct have a copyable type and the two arguments must have reasonably equivalent types. The latter requirement could be relaxed if we emitted a suitable type conversion statement. */ - if (!is_gimple_reg_type (TREE_TYPE (param)) + if (TREE_ADDRESSABLE (TREE_TYPE (param)) || !useless_type_conversion_p (TREE_TYPE (param), TREE_TYPE (arg))) break; - /* The parameter should be a real operand, so that phi node -created for it at the start of the function has the meaning -of copying the value. This test implies is_gimple_reg_type -from the previous condition, however this one could be -relaxed by being more careful with copying the new value -of the parameter (emitting appropriate GIMPLE_ASSIGN and -updating the virtual operands). */ - if (!is_gimple_reg (param)) + if (is_gimple_reg_type (TREE_TYPE (param)) + ? !is_gimple_reg (param) + : (!is_gimple_variable (param) +|| TREE_THIS_VOLATILE (param) +|| may_be_aliased (param) +|| !gimple_call_must_tail_p (call))) break; } } @@ -938,9 +936,9 @@ find_tail_calls (basic_block bb, struct param = DECL_CHAIN (param), idx++) { tree ddef, arg = gimple_call_arg (call, idx); - if (is_gimple_reg (param) - && (ddef = ssa_default_def (cfun, param)) - && (arg != ddef)) + if (!is_gimple_reg (param) + || ((ddef = ssa_default_def (cfun, param)) + && arg != ddef)) bitmap_set_bit (tailr_arg_needs_copy, idx); } } @@ -1216,6 +1214,7 @@ eliminate_tail_call (struct tailcall *t, /* Add phi node entries for arguments. The ordering of the phi nodes should be the same as the ordering of the arguments. */ + auto_vec copies; for (param = DECL_ARGUMENTS (current_function_decl), idx = 0, gpi = gsi_start_phis (first); param; @@ -1224,6 +1223,35 @@ eliminate_tail_call (struct tailcall *t, if (!bitmap_bit_p (tailr_arg_needs_copy, idx)) continue; + if (!is_gimple_reg_type (TREE_TYPE (param))) + { + if (param == gimple_call_arg (stmt, idx)) + continue; + /* First check if param isn't used by any of the following +call arguments. If it is, we need to copy first to +a temporary and only after doing all the assignments copy it +to param. */ + size_t idx2 = idx + 1; + tree param2 = DECL_CHAIN (param); + for (; param2; param2 = DECL_CHAIN (param2), idx2++) + if (!is_gimple_reg_type (TREE_TYPE (param))) + { + tree base = get_base_address (gimple_call_arg (stmt, idx2)); +
Re: [pushed] Only write gcov when file output is on [PR119553]
On Mon, Mar 31, 2025 at 7:54 PM Jørgen Kvalsvik wrote: > > gcov_write_* functions must be guarded so they only are called when > output_to_file is true, like for -fcondition-coverage, otherwise it > triggers an invalid read as detected by valgrind. The gcno file is > mostly written to from profile.cc, so it doesn't make too much sense > to hide it in path-coverage.cc. The find_paths name was a bit > imprecise, and is renamed to instrument_prime_paths. OK. Richard. > PR gcov-profile/119553 > > gcc/ChangeLog: > > * path-coverage.cc (find_paths): Return path count, don't > write to gcno, and rename to ... > (instrument_prime_paths): ... this. > * profile.cc (branch_prob): Write path counts to gcno. > --- > gcc/path-coverage.cc | 15 ++- > gcc/profile.cc | 12 ++-- > 2 files changed, 16 insertions(+), 11 deletions(-) > > diff --git a/gcc/path-coverage.cc b/gcc/path-coverage.cc > index 55b39298603..55058fd04ff 100644 > --- a/gcc/path-coverage.cc > +++ b/gcc/path-coverage.cc > @@ -504,8 +504,8 @@ flush_on_gsi (gimple_stmt_iterator *gsi, size_t bucket, > tree local, tree mask, > bit N%64 in bucket N/64. For large functions, an individual basic block > will only be part of a small subset of paths, and by extension buckets and > local counters. Only the necessary counters are read and written. */ > -void > -find_paths (struct function *fn) > +unsigned > +instrument_prime_paths (struct function *fn) > { >mark_dfs_back_edges (fn); >vec> paths = find_prime_paths (fn); > @@ -515,7 +515,7 @@ find_paths (struct function *fn) >warning_at (fn->function_start_locus, OPT_Wcoverage_too_many_paths, > "paths exceeding limit, giving up path coverage"); >release_vec_vec (paths); > - return; > + return 0; > } > >tree gcov_type_node = get_gcov_type (); > @@ -526,14 +526,9 @@ find_paths (struct function *fn) >if (!coverage_counter_alloc (GCOV_COUNTER_PATHS, nbuckets)) > { >release_vec_vec (paths); > - return; > + return 0; > } > > - gcov_position_t offset {}; > - offset = gcov_write_tag (GCOV_TAG_PATHS); > - gcov_write_unsigned (paths.length ()); > - gcov_write_length (offset); > - >hash_map ands; >hash_map iors; >hash_map flushes; > @@ -771,6 +766,8 @@ find_paths (struct function *fn) > } > } > > + const unsigned npaths = paths.length (); >blocks.release (); >release_vec_vec (paths); > + return npaths; > } > diff --git a/gcc/profile.cc b/gcc/profile.cc > index 0b222cf3864..c0f5097726b 100644 > --- a/gcc/profile.cc > +++ b/gcc/profile.cc > @@ -1611,9 +1611,17 @@ branch_prob (bool thunk) > instrument_values (values); > } > > - void find_paths (struct function*); > + unsigned instrument_prime_paths (struct function*); >if (path_coverage_flag) > -find_paths (cfun); > +{ > + const unsigned npaths = instrument_prime_paths (cfun); > + if (output_to_file) > + { > + gcov_position_t offset = gcov_write_tag (GCOV_TAG_PATHS); > + gcov_write_unsigned (npaths); > + gcov_write_length (offset); > + } > +} > >free_aux_for_edges (); > > -- > 2.39.5 >
[PATCH v2] RISC-V: Tweak testcase for PIE
Linux toolchain may configured with --enable-default-pie, and that will cause lots of regression test failures because the function name will append with @plt suffix (e.g. `call foo` become `call foo@plt`), also some code generation will different due to the code model like the address generation for global variable, so we may add -fno-pie to those testcases to prevent that. We may consider just drop @plt suffix to prevent that at all, because it's not difference between w/ and w/o @plt suffix, the linker will pick the right one to do, however it's late stage of GCC development, so just tweak the testcase should be the best way to do now. Changes from v1: - Add more testcase for PIE (from rvv.exp). - Tweak the rule for match @plt. gcc/testsuite/ChangeLog: * gcc.target/riscv/rv32i_zcmp.c: Tweak testcase for PIE. * gcc.target/riscv/rv32e_zcmp.c: Likewise. * gcc.target/riscv/zcmp_stack_alignment.c: Likewise. * gcc.target/riscv/cm_mv_rv32.c: Likewise. * gcc.target/riscv/cpymem-64.c: Likewise. * gcc.target/riscv/fmax-snan.c: Likewise. * gcc.target/riscv/fmaxf-snan.c: Likewise. * gcc.target/riscv/fmin-snan.c: Likewise. * gcc.target/riscv/fminf-snan.c: Likewise. * gcc.target/riscv/large-model.c: Likewise. * gcc.target/riscv/predef-1.c: Likewise. * gcc.target/riscv/predef-4.c: Likewise. * gcc.target/riscv/predef-7.c: Likewise. * gcc.target/riscv/predef-9.c: Likewise. * gcc.target/riscv/rvv/base/abi-callee-saved-2-save-restore.c: Likewise. * gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c: Likewise. * gcc.target/riscv/rvv/base/abi-callee-saved-2.c: Likewise. * gcc.target/riscv/rvv/base/cmpmem-1.c: Likewise. * gcc.target/riscv/rvv/base/cmpmem-3.c: Likewise. * gcc.target/riscv/rvv/base/cmpmem-4.c: Likewise. * gcc.target/riscv/rvv/base/cpymem-1.c: Likewise. * gcc.target/riscv/rvv/base/movmem-1.c: Likewise. * gcc.target/riscv/rvv/base/pr114352-3.c: Likewise. * gcc.target/riscv/rvv/base/setmem-1.c: Likewise. * gcc.target/riscv/rvv/base/setmem-2.c: Likewise. * gcc.target/riscv/rvv/base/setmem-3.c: Likewise. * gcc.target/riscv/rvv/base/spill-9.c: Likewise. * g++.target/riscv/mv-symbols1.C: Likewise. * g++.target/riscv/mv-symbols3.C: Likewise. * g++.target/riscv/mv-symbols4.C: Likewise. * g++.target/riscv/mv-symbols5.C: Likewise. * g++.target/riscv/mvc-symbols1.C: Likewise. * g++.target/riscv/mvc-symbols3.C: Likewise. --- gcc/testsuite/g++.target/riscv/mv-symbols1.C| 4 ++-- gcc/testsuite/g++.target/riscv/mv-symbols3.C| 4 ++-- gcc/testsuite/g++.target/riscv/mv-symbols4.C| 4 ++-- gcc/testsuite/g++.target/riscv/mv-symbols5.C| 4 ++-- gcc/testsuite/g++.target/riscv/mvc-symbols1.C | 4 ++-- gcc/testsuite/g++.target/riscv/mvc-symbols3.C | 4 ++-- gcc/testsuite/gcc.target/riscv/cm_mv_rv32.c | 4 ++-- gcc/testsuite/gcc.target/riscv/cpymem-64.c | 4 ++-- gcc/testsuite/gcc.target/riscv/fmax-snan.c | 2 +- gcc/testsuite/gcc.target/riscv/fmaxf-snan.c | 2 +- gcc/testsuite/gcc.target/riscv/fmin-snan.c | 2 +- gcc/testsuite/gcc.target/riscv/fminf-snan.c | 2 +- gcc/testsuite/gcc.target/riscv/large-model.c| 2 +- gcc/testsuite/gcc.target/riscv/predef-1.c | 2 +- gcc/testsuite/gcc.target/riscv/predef-4.c | 2 +- gcc/testsuite/gcc.target/riscv/predef-7.c | 2 +- gcc/testsuite/gcc.target/riscv/predef-9.c | 2 +- gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c | 6 +++--- gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c | 6 +++--- .../riscv/rvv/base/abi-callee-saved-2-save-restore.c| 6 +++--- .../gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c | 6 +++--- .../gcc.target/riscv/rvv/base/abi-callee-saved-2.c | 6 +++--- gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-1.c | 4 ++-- gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-3.c | 4 ++-- gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-4.c | 4 ++-- gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c | 4 ++-- gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c | 2 +- gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-3.c| 4 ++-- gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c | 2 +- gcc/testsuite/gcc.target/riscv/rvv/base/setmem-2.c | 2 +- gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c | 2 +- gcc/testsuite/gcc.target/riscv/rvv/base/spill-9.c | 2 +- gcc/testsuite/gcc.target/riscv/zcmp_stack_alignment.c | 2 +- 33 files changed, 56 insertions(+), 56 deletions(-) diff --git a/gcc/testsuite/g++.targ
Re: [PATCH] APX: add nf counterparts for rotl split pattern [PR 119539]
On Tue, Apr 1, 2025 at 10:55 AM Hongtao Liu wrote: > > On Tue, Apr 1, 2025 at 4:40 PM Hongyu Wang wrote: > > > > Hi, > > > > For spiltter after 3_mask it now splits the pattern > > to *3_mask, causing the splitter doesn't generate > > nf variant. Add corresponding nf counterpart for define_insn_and_split > > to make the splitter also works for nf insn. > > > > Bootstrapped & regtested on x86-64-pc-linux-gnu. > > > > Ok for trunk? > > > > gcc/ChangeLog: > > > > PR target/119539 > > * config/i386/i386.md (*3_mask_nf): New > > define_insn_and_split. > > (*3_mask_1_nf): Likewise. > > (*3_mask): Use force_lowpart_subreg. > > > > gcc/testsuite/ChangeLog: > > > > PR target/119539 > > * gcc.target/i386/apx-nf-pr119539.c: New test. > > --- > > gcc/config/i386/i386.md | 46 ++- > > .../gcc.target/i386/apx-nf-pr119539.c | 6 +++ > > 2 files changed, 50 insertions(+), 2 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/apx-nf-pr119539.c > > > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md > > index f7f790d2aeb..42312f0c330 100644 > > --- a/gcc/config/i386/i386.md > > +++ b/gcc/config/i386/i386.md > > @@ -18131,6 +18131,30 @@ (define_expand "3" > >DONE; > > }) > > > > +;; Avoid useless masking of count operand. > > +(define_insn_and_split "*3_mask_nf" > > + [(set (match_operand:SWI 0 "nonimmediate_operand") > > + (any_rotate:SWI > > + (match_operand:SWI 1 "nonimmediate_operand") > > + (subreg:QI > > + (and > > + (match_operand 2 "int248_register_operand" "c") > > + (match_operand 3 "const_int_operand")) 0)))] > > + "TARGET_APX_NF > > + && ix86_binary_operator_ok (, mode, operands) > > + && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) > > + == GET_MODE_BITSIZE (mode)-1 > > + && ix86_pre_reload_split ()" > > + "#" > > + "&& 1" > > + [(set (match_dup 0) > > + (any_rotate:SWI (match_dup 1) > > + (match_dup 2)))] > > +{ > > + operands[2] = force_lowpart_subreg (QImode, operands[2], > > + GET_MODE (operands[2])); > > +}) > Can we just change the output in original pattern, I think combine > will still match the pattern even w/ clobber flags. > > like > > @@ -17851,8 +17851,17 @@ (define_insn_and_split "*3_mask" > (match_dup 2))) >(clobber (reg:CC FLAGS_REG))])] > { > - operands[2] = force_reg (GET_MODE (operands[2]), operands[2]); > - operands[2] = gen_lowpart (QImode, operands[2]); > + if (TARGET_APX_F) > +{ > + emit_move_insn (operands[0], > +gen_rtx_ (mode, operands[1], operands[2])); > + DONE; > +} > + else > +{ > + operands[2] = force_reg (GET_MODE (operands[2]), operands[2]); > + operands[2] = gen_lowpart (QImode, operands[2]); Please note we have a new "force_lowpart_subreg" function that operates on a register operand and (if possible) simplifies a subreg of a subreg, otherwise it forces the operand to register and creates a subreg of the temporary. Similar to the above combination, with a possibility to avoid a temporary reg. > Also we can remove constraint "c" in the original pattern. The constraint is here to handle corner case, where combine propagated an insn RTX with fixed register, other than %ecx, into shift RTX. Register allocator was not able to fix the combination, so TARGET_LEGITIMATE_COMBINED_INSN hook was introduced that rejected unwanted combinations. Please see the comment in ix86_legitimate_combined_insn function. Perhaps the above is not relevant anymore with the new register allocator (LRA), and the constraint can indeed be removed. But please take some caution. Uros.
Re: [PATCH] combine: Use reg_used_between_p rather than modified_between_p in two spots [PR119291]
On Fri, 28 Mar 2025, Jakub Jelinek wrote: > Hi! > > The following testcase is miscompiled on x86_64-linux at -O2 by the combiner. > We have from earlier combinations > (insn 22 21 23 4 (set (reg:SI 104 [ _7 ]) > (const_int 0 [0])) "pr119291.c":25:15 96 {*movsi_internal} > (nil)) > (insn 23 22 24 4 (set (reg/v:SI 117 [ e ]) > (reg/v:SI 116 [ e ])) 96 {*movsi_internal} > (expr_list:REG_DEAD (reg/v:SI 116 [ e ]) > (nil))) > (note 24 23 25 4 NOTE_INSN_DELETED) > (insn 25 24 26 4 (parallel [ > (set (reg:CCZ 17 flags) > (compare:CCZ (neg:SI (reg:SI 104 [ _7 ])) > (const_int 0 [0]))) > (set (reg/v:SI 116 [ e ]) > (neg:SI (reg:SI 104 [ _7 ]))) > ]) "pr119291.c":26:13 977 {*negsi_2} > (expr_list:REG_DEAD (reg:SI 104 [ _7 ]) > (nil))) > (note 26 25 27 4 NOTE_INSN_DELETED) > (insn 27 26 28 4 (set (reg:DI 128 [ _9 ]) > (ne:DI (reg:CCZ 17 flags) > (const_int 0 [0]))) "pr119291.c":26:13 1447 {*setcc_di_1} > (expr_list:REG_DEAD (reg:CCZ 17 flags) > (nil))) > and try_combine is called on i3 25 and i2 22 (second time) > and reach the hunk being patched with simplified i3 > (insn 25 24 26 4 (parallel [ > (set (pc) > (pc)) > (set (reg/v:SI 116 [ e ]) > (const_int 0 [0])) > ]) "pr119291.c":28:13 977 {*negsi_2} > (expr_list:REG_DEAD (reg:SI 104 [ _7 ]) > (nil))) > and > (insn 22 21 23 4 (set (reg:SI 104 [ _7 ]) > (const_int 0 [0])) "pr119291.c":27:15 96 {*movsi_internal} > (nil)) > Now, the try_combine code there attempts to split two independent > sets in newpat by moving one of them to i2. > And among other tests it checks > !modified_between_p (SET_DEST (set1), i2, i3) > which is certainly needed, if there would be say > (set (reg/v:SI 116 [ e ]) (const_int 42 [0x2a])) > in between i2 and i3, we couldn't do that, as that set would overwrite > the value set by set1 we want to move to the i2 position. > But in this case pseudo 116 isn't set in between i2 and i3, but used > (and additionally there is a REG_DEAD note for it). > > This is equally bad for the move, because while the i3 insn > and later will see the pseudo value that we set, the insn in between > which uses the value will see a different value from the one that > it should see. > > As we don't check for that, in the end try_combine succeeds and > changes the IL to: > (insn 22 21 23 4 (set (reg/v:SI 116 [ e ]) > (const_int 0 [0])) "pr119291.c":27:15 96 {*movsi_internal} > (nil)) > (insn 23 22 24 4 (set (reg/v:SI 117 [ e ]) > (reg/v:SI 116 [ e ])) 96 {*movsi_internal} > (expr_list:REG_DEAD (reg/v:SI 116 [ e ]) > (nil))) > (note 24 23 25 4 NOTE_INSN_DELETED) > (insn 25 24 26 4 (set (pc) > (pc)) "pr119291.c":28:13 2147483647 {NOOP_MOVE} > (nil)) > (note 26 25 27 4 NOTE_INSN_DELETED) > (insn 27 26 28 4 (set (reg:DI 128 [ _9 ]) > (const_int 0 [0])) "pr119291.c":28:13 95 {*movdi_internal} > (nil)) > (note, the i3 got turned into a nop and try_combine also modified insn 27). > > The following patch replaces the modified_between_p > tests with reg_used_between_p, my understanding is that > modified_between_p is a subset of reg_used_between_p, so one > doesn't need both. > > Bootstrapped/regtested on x86_64-linux, i686-linux, aarch64-linux, > powerpc64le-linux and s390x-linux, ok for trunk? > > Looking at this some more today, I think we should special case > set_noop_p because that can be put into i2 (except for the JUMP_P > violations), currently both modified_between_p (pc_rtx, i2, i3) > and reg_used_between_p (pc_rtx, i2, i3) returns false. > I'll post a patch incrementally for that (but that feels like > new optimization, so probably not something that should be backported). Trying to review this I noticed that both the comment in combine suggests that memory set is important and modified_between_p handles MEM_P 'x' and for non-REG 'x' possibly MEM operands (CONCAT?) while reg_used_between_p only handles REG_P 'reg'. Also shouldn't you use reg_referenced_p? At least that function seems to handle SET_SRC/SET_DEST separately? Can we constrain SET_DEST (set1/set0) to a REG_P in combine? Why does the comment talk about memory? > 2025-03-28 Jakub Jelinek > > PR rtl-optimization/119291 > * combine.cc (try_combine): For splitting of PARALLEL with > 2 independent SETs into i2 and i3 sets check reg_used_between_p > of the SET_DESTs rather than just modified_between_p. > > * gcc.c-torture/execute/pr119291.c: New test. > > --- gcc/combine.cc.jj 2025-03-25 09:34:33.469102343 +0100 > +++ gcc/combine.cc2025-03-27 09:50:15.768567383 +0100 > @@ -4012,18 +4012,18 @@ try_combine (rtx_insn *i3, rtx_insn *i2, >rtx set1 = XVECEXP (newpat, 0, 1); > >/* Normally, it doesn't matter which of the two is done first, b
Re: [PATCH] testsuite: arm: Fix dg-final in short-vfp-1.c [PR119556]
On 31/03/2025 20:04, Christophe Lyon wrote: > Recent syntactic fixes enabled the test, but the result was failing. > > It turns out it was missing a space between the register arguments in > the scan-assembler-times directives. > > gcc/testsuite/ChangeLog: > > PR target/119556 > * gcc.target/arm/short-vfp-1.c: Add missing spaces. > --- > gcc/testsuite/gcc.target/arm/short-vfp-1.c | 12 ++-- > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/gcc/testsuite/gcc.target/arm/short-vfp-1.c > b/gcc/testsuite/gcc.target/arm/short-vfp-1.c > index 18d38a58037..f6866c4f601 100644 > --- a/gcc/testsuite/gcc.target/arm/short-vfp-1.c > +++ b/gcc/testsuite/gcc.target/arm/short-vfp-1.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-require-effective-target arm_vfp_ok } > +/* { dg-require-effective-target arm_vfp_ok } */ > /* { dg-add-options arm_vfp } */ > > int > @@ -38,8 +38,8 @@ test_sihi (short x) >return (int)x; > } > > -/* { dg-final { scan-assembler-times {vcvt\.s32\.f32\ts[0-9]+,s[0-9]+} 2 } } > */ > -/* { dg-final { scan-assembler-times {vcvt\.f32\.s32\ts[0-9]+,s[0-9]+} 2 } } > */ > -/* { dg-final { scan-assembler-times {vmov\tr[0-9]+,s[0-9]+} 2 } } */ > -/* { dg-final { scan-assembler-times {vmov\ts[0-9]+,r[0-9]+} 2 } } */ > -/* { dg-final { scan-assembler-times {sxth\tr[0-9]+,r[0-9]+} 2 } } */ > +/* { dg-final { scan-assembler-times {vcvt\.s32\.f32\ts[0-9]+, s[0-9]+} 2 } > } */ > +/* { dg-final { scan-assembler-times {vcvt\.f32\.s32\ts[0-9]+, s[0-9]+} 2 } > } */ > +/* { dg-final { scan-assembler-times {vmov\tr[0-9]+, s[0-9]+} 2 } } */ > +/* { dg-final { scan-assembler-times {vmov\ts[0-9]+, r[0-9]+} 2 } } */ > +/* { dg-final { scan-assembler-times {sxth\tr[0-9]+, r[0-9]+} 2 } } */ OK. Changes like this should be considered obvious, BTW. R.
Re: [PATCH] target/119549 - fixup handling of -mno-sse4
On Tue, Apr 01, 2025 at 01:36:23PM +0800, Hongtao Liu wrote: > >Changing ix86_valid_target_attribute_inner_p might be even better because > >OPT_msse4 is RejectNegative option, so !value for it looks weird. > msse4 is defined as ix86_opt_isa in ix86_valid_target_attribute_inner_p > > 1055IX86_ATTR_ISA ("sse4", OPT_msse4), > > and would be handled in ix86_handle_option > > 1282 else if (type == ix86_opt_isa) > 1283{ > 1284 struct cl_decoded_option decoded; > 1285 > 1286 generate_option (opt, NULL, opt_set_p, CL_TARGET, > &decoded); > 1287 ix86_handle_option (opts, opts_set, > 1288 &decoded, input_location); > 1289} > > So I think it's already correct here. The entries are msse4 Target RejectNegative Mask(ISA_SSE4_2) Var(ix86_isa_flags) Save Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 built-in functions and code generation. mno-sse4 Target RejectNegative InverseMask(ISA_SSE4_1) Var(ix86_isa_flags) Save Do not support SSE4.1 and SSE4.2 built-in functions and code generation. so if it is turned into something without RejectNegative, it will be wrong in the Mask argument, either for the positive or for the negative case. I wonder though if msse4 should not be a Target RejectNegative Alias to msse4.2 and mno-sse4 RejectNegative Alias to mno-sse4.1. Though, unsure if one will still be able to specify it in #pragma GCC target or target attribute... Jakub
[PATCH] gimple-low: Diagnose assume attr expressions defining labels which are used as unary && operands outside of those [PR119537]
Hi! The following testcases ICE on invalid code which defines labels inside of statement expressions and then uses &&label from code outside of the statement expressions. The C++ FE diagnoses that with a warning (not specifically for assume attribute, genericallly about taking address of a label outside of a statement expression so computed goto could violate the requirement that statement expression is not entered from outside of it through a jump into it), the C FE doesn't diagnose anything. Normal direct gotos to such labels are diagnosed by both C and C++. In the assume attribute case it is actually worse than for addresses of labels in normal statement expressions, in that case the labels are still in the current function, so invalid program can still jump to those (and in case of OpenMP/OpenACC where it is also invalid and stuff is moved to a separate function, such movement is done post cfg discovery of FORCED_LABELs and worst case one can run into cases which fail to assemble, but I haven't succeeded in creating ICE for that). For assume at -O0 we'd just throw away the assume expression if it is not a simple condition and so the label is then not defined anywhere and we ICE during cfg pass. The gimplify.cc hunks fix that, as we don't have FORCED_LABELs discovery done yet, it preserves all assume expressions which contain used user labels. With that we ICE during IRA, which is upset about an indirect jump to a label which doesn't exist. So, the gimple-low.cc hunks add diagnostics of the problem, it gathers uids of all the user used labels inside of the assume expressions (usually none) and if it finds any, walks the IL to find uses of those from outside of those expressions now outlined into separate magic functions. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2025-04-01 Jakub Jelinek PR middle-end/119537 * gimplify.cc (find_used_user_labels): New function. (gimplify_call_expr): Don't remove complex assume expression at -O0 if it defines any user labels. * gimple-low.cc: Include diagnostic-core.h. (assume_labels): New variable. (diagnose_assume_labels): New function. (lower_function_body): Call it via walk_gimple_seq if assume_labels is non-NULL, then BITMAP_FREE assume_labels. (find_assumption_locals_r): Record in assume_labels uids of user labels defined in assume attribute expressions. * c-c++-common/pr119537-1.c: New test. * c-c++-common/pr119537-2.c: New test. --- gcc/gimplify.cc.jj 2025-03-31 12:53:44.853727077 +0200 +++ gcc/gimplify.cc 2025-03-31 17:05:40.854893880 +0200 @@ -4508,6 +4508,21 @@ gimplify_variant_call_expr (tree expr, f } +/* Helper function for gimplify_call_expr, called via walk_tree. + Find used user labels. */ + +static tree +find_used_user_labels (tree *tp, int *, void *) +{ + if (TREE_CODE (*tp) == LABEL_EXPR + && !DECL_ARTIFICIAL (LABEL_EXPR_LABEL (*tp)) + && DECL_NAME (LABEL_EXPR_LABEL (*tp)) + && TREE_USED (LABEL_EXPR_LABEL (*tp))) +return *tp; + return NULL_TREE; +} + + /* Gimplify the CALL_EXPR node *EXPR_P into the GIMPLE sequence PRE_P. WANT_VALUE is true if the result of the call is desired. */ @@ -4568,8 +4583,14 @@ gimplify_call_expr (tree *expr_p, gimple fndecl, 0)); return GS_OK; } - /* If not optimizing, ignore the assumptions. */ - if (!optimize || seen_error ()) + /* If not optimizing, ignore the assumptions unless there +are used user labels in it. */ + if ((!optimize + && !walk_tree_without_duplicates (&CALL_EXPR_ARG (*expr_p, +0), +find_used_user_labels, +NULL)) + || seen_error ()) { *expr_p = NULL_TREE; return GS_ALL_DONE; --- gcc/gimple-low.cc.jj2025-03-21 20:25:38.396064280 +0100 +++ gcc/gimple-low.cc 2025-03-31 20:20:38.961772695 +0200 @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3. #include "tree-inline.h" #include "gimple-walk.h" #include "attribs.h" +#include "diagnostic-core.h" /* The differences between High GIMPLE and Low GIMPLE are the following: @@ -78,6 +79,10 @@ struct lower_data bool cannot_fallthru; }; +/* Bitmap of LABEL_DECL uids for user labels moved into assume outlined + functions. */ +static bitmap assume_labels; + static void lower_stmt (gimple_stmt_iterator *, struct lower_data *); static void lower_gimple_bind (gimple_stmt_iterator *, struct lower_data *); static void lower_try_catch (gimple_stmt_iterator *, struct lower_data *); @@ -87,6 +92,29 @@ static void lower_builtin_posix_memalign static void lower_builti
[committed] OpenMP: Reorder diagnostic in modify_call_for_omp_dispatch [PR119559]
Reorder to the diagnostic to avoid issues when nappend < ninteropfor the case no append_args arguments at all (nappend == 0) and one interop clause to dispatch (ninterop == 1) Committed as r15-9120-gde92ac6f11e605. Tobias commit de92ac6f11e605987421fe1443b5b81ff172dbb6 Author: Tobias Burnus Date: Tue Apr 1 10:29:27 2025 +0200 OpenMP: Reorder diagnostic in modify_call_for_omp_dispatch [PR119559] gcc/ChangeLog: PR middle-end/119559 * gimplify.cc (modify_call_for_omp_dispatch): Reorder checks to avoid asserts and bogus diagnostic. --- gcc/gimplify.cc | 36 1 file changed, 16 insertions(+), 20 deletions(-) diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index a8399dc8363..02ad3981adf 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -3933,25 +3933,9 @@ modify_call_for_omp_dispatch (tree expr, tree dispatch_clauses, the split between early/late resolution, etc instead of the code as written by the user. */ if (dispatch_interop) -{ - for (tree t = dispatch_interop; t; t = TREE_CHAIN (t)) - if (OMP_CLAUSE_CODE (t) == OMP_CLAUSE_INTEROP) - ninterop++; - if (nappend < ninterop) - { - error_at (OMP_CLAUSE_LOCATION (dispatch_interop), - "number of list items in % clause (%d) " - "exceeds the number of % items (%d) for " - "% candidate %qD", - ninterop, nappend, fndecl); - inform (dispatch_append_args - ? EXPR_LOCATION (TREE_PURPOSE (dispatch_append_args)) - : DECL_SOURCE_LOCATION (fndecl), - "% candidate %qD declared here", - fndecl); - ninterop = nappend; - } -} +for (tree t = dispatch_interop; t; t = TREE_CHAIN (t)) + if (OMP_CLAUSE_CODE (t) == OMP_CLAUSE_INTEROP) + ninterop++; if (dispatch_interop && !dispatch_device_num) { gcc_checking_assert (ninterop > 1); @@ -3959,7 +3943,19 @@ modify_call_for_omp_dispatch (tree expr, tree dispatch_clauses, "the % clause must be present if the % " "clause has more than one list item"); } - else if (dispatch_append_args) + if (nappend < ninterop) +{ + error_at (OMP_CLAUSE_LOCATION (dispatch_interop), + "number of list items in % clause (%d) " + "exceeds the number of % items (%d) for " + "% candidate %qD", ninterop, nappend, fndecl); + inform (dispatch_append_args + ? EXPR_LOCATION (TREE_PURPOSE (dispatch_append_args)) + : DECL_SOURCE_LOCATION (fndecl), + "% candidate %qD declared here", fndecl); + ninterop = nappend; +} + if (dispatch_append_args) { tree *buffer = XALLOCAVEC (tree, nargs + nappend); tree arg = TYPE_ARG_TYPES (TREE_TYPE (fndecl));
[PATCH] APX: add nf counterparts for rotl split pattern [PR 119539]
Hi, For spiltter after 3_mask it now splits the pattern to *3_mask, causing the splitter doesn't generate nf variant. Add corresponding nf counterpart for define_insn_and_split to make the splitter also works for nf insn. Bootstrapped & regtested on x86-64-pc-linux-gnu. Ok for trunk? gcc/ChangeLog: PR target/119539 * config/i386/i386.md (*3_mask_nf): New define_insn_and_split. (*3_mask_1_nf): Likewise. (*3_mask): Use force_lowpart_subreg. gcc/testsuite/ChangeLog: PR target/119539 * gcc.target/i386/apx-nf-pr119539.c: New test. --- gcc/config/i386/i386.md | 46 ++- .../gcc.target/i386/apx-nf-pr119539.c | 6 +++ 2 files changed, 50 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/apx-nf-pr119539.c diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index f7f790d2aeb..42312f0c330 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -18131,6 +18131,30 @@ (define_expand "3" DONE; }) +;; Avoid useless masking of count operand. +(define_insn_and_split "*3_mask_nf" + [(set (match_operand:SWI 0 "nonimmediate_operand") + (any_rotate:SWI + (match_operand:SWI 1 "nonimmediate_operand") + (subreg:QI + (and + (match_operand 2 "int248_register_operand" "c") + (match_operand 3 "const_int_operand")) 0)))] + "TARGET_APX_NF + && ix86_binary_operator_ok (, mode, operands) + && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) + == GET_MODE_BITSIZE (mode)-1 + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(set (match_dup 0) + (any_rotate:SWI (match_dup 1) + (match_dup 2)))] +{ + operands[2] = force_lowpart_subreg (QImode, operands[2], + GET_MODE (operands[2])); +}) + ;; Avoid useless masking of count operand. (define_insn_and_split "*3_mask" [(set (match_operand:SWI 0 "nonimmediate_operand") @@ -18153,8 +18177,8 @@ (define_insn_and_split "*3_mask" (match_dup 2))) (clobber (reg:CC FLAGS_REG))])] { - operands[2] = force_reg (GET_MODE (operands[2]), operands[2]); - operands[2] = gen_lowpart (QImode, operands[2]); + operands[2] = force_lowpart_subreg (QImode, operands[2], + GET_MODE (operands[2])); }) (define_split @@ -18174,6 +18198,24 @@ (define_split (and:SI (match_dup 2) (match_dup 3)) 0)))] "operands[4] = gen_reg_rtx (mode);") +(define_insn_and_split "*3_mask_1_nf" + [(set (match_operand:SWI 0 "nonimmediate_operand") + (any_rotate:SWI + (match_operand:SWI 1 "nonimmediate_operand") + (and:QI + (match_operand:QI 2 "register_operand" "c") + (match_operand:QI 3 "const_int_operand"] + "TARGET_APX_NF + && ix86_binary_operator_ok (, mode, operands) + && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) + == GET_MODE_BITSIZE (mode)-1 + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(set (match_dup 0) + (any_rotate:SWI (match_dup 1) + (match_dup 2)))]) + (define_insn_and_split "*3_mask_1" [(set (match_operand:SWI 0 "nonimmediate_operand") (any_rotate:SWI diff --git a/gcc/testsuite/gcc.target/i386/apx-nf-pr119539.c b/gcc/testsuite/gcc.target/i386/apx-nf-pr119539.c new file mode 100644 index 000..5dfec55ed76 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/apx-nf-pr119539.c @@ -0,0 +1,6 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-mapx-features=nf -march=x86-64 -O2" } */ +/* { dg-final { scan-assembler-times "\{nf\} rol" 2 } } */ + +long int f1 (int x) { return ~(1ULL << (x & 0x3f)); } +long int f2 (char x) { return ~(1ULL << (x & 0x3f)); } -- 2.31.1
Re: [PATCH] libstdc++: Fix -Warray-bounds warning in std::vector::resize [PR114945]
On Mon, Mar 31, 2025 at 6:23 PM Jonathan Wakely wrote: > This is yet another false positive warning fix. This time the compiler > can't prove that when the vector has sufficient excess capacity to > append new elements, the pointer to the existing storage is not null. > > libstdc++-v3/ChangeLog: > > PR libstdc++/114945 > * include/bits/vector.tcc (vector::_M_default_append): Add > unreachable condition so the compiler knows that _M_finish is > not null. > * testsuite/23_containers/vector/capacity/114945.cc: New test. > --- > > Tested x86_64-linux. > LGTM > > libstdc++-v3/include/bits/vector.tcc | 3 ++ > .../23_containers/vector/capacity/114945.cc | 36 +++ > 2 files changed, 39 insertions(+) > create mode 100644 > libstdc++-v3/testsuite/23_containers/vector/capacity/114945.cc > > diff --git a/libstdc++-v3/include/bits/vector.tcc > b/libstdc++-v3/include/bits/vector.tcc > index 7a92f34ec64..66d73b4cfd7 100644 > --- a/libstdc++-v3/include/bits/vector.tcc > +++ b/libstdc++-v3/include/bits/vector.tcc > @@ -768,6 +768,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER > > if (__navail >= __n) > { > + if (!this->_M_impl._M_finish) > + __builtin_unreachable(); > + > _GLIBCXX_ASAN_ANNOTATE_GROW(__n); > this->_M_impl._M_finish = > std::__uninitialized_default_n_a(this->_M_impl._M_finish, > diff --git > a/libstdc++-v3/testsuite/23_containers/vector/capacity/114945.cc > b/libstdc++-v3/testsuite/23_containers/vector/capacity/114945.cc > new file mode 100644 > index 000..daafc59d5a9 > --- /dev/null > +++ b/libstdc++-v3/testsuite/23_containers/vector/capacity/114945.cc > @@ -0,0 +1,36 @@ > +// { dg-options "-O2 -Werror=stringop-overflow -Werror=array-bounds" } > +// { dg-do compile { target c++11 } } > + > +// Bug libstdc++/114945 > +// Sporadic std::vector::resize() -Wstringop-overflow or -Warray-bounds > warning > + > +#include > +#include > +template struct b { > + void resize(std::size_t c) { d.resize(c); } > + template void f(a, e); > + std::vector d; > +}; > +#include > +std::regex g; > +uint64_t h; > +uint32_t i; > +struct s { > + enum class j : size_t; > + void k(); > + using l = b; > + std::vector m; > +}; > +enum class s::j : size_t { n }; > +void o() { g = ""; } > +void s::k() { > + l p; > + auto q = uint32_t(), r = uint32_t(); > + if (h) > +r = i; > + b t; > + if (q || r) > +p.f(j::n, 5); > + t.resize(4); > + m.push_back(p); > +} > -- > 2.49.0 > >
Re: [PATCH] profile: Another profiling musttail call fix [PR119535]
On Tue, 1 Apr 2025, Jakub Jelinek wrote: > Hi! > > As the following testcase shows, EDGE_FAKE edges from musttail calls to > EXIT aren't the only edges we should ignore, we need to ignore also > edges created by the splitting of blocks for the EDGE_FAKE creation that > point from the musttail calls to the fallthrough block, which typically does > the return or with PHIs for the return value. > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for > trunk? OK > 2025-04-01 Jakub Jelinek > > PR gcov-profile/119535 > * profile.cc (branch_prob): Ignore any edges from bbs ending with > musttail call, rather than only EDGE_FAKE edges from those to EXIT. > > * c-c++-common/pr119535.c: New test. > > --- gcc/profile.cc.jj 2025-03-27 09:29:37.112574359 +0100 > +++ gcc/profile.cc2025-03-31 10:24:39.593446405 +0200 > @@ -1341,8 +1341,7 @@ branch_prob (bool thunk) > ignored_edges++; > } > - /* Ignore fake edges after musttail calls. */ > - if ((e->flags & EDGE_FAKE) > - && e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)) > + /* Ignore edges after musttail calls. */ > + if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)) > { > gimple_stmt_iterator gsi = gsi_last_bb (e->src); > gimple *stmt = gsi_stmt (gsi); > --- gcc/testsuite/c-c++-common/pr119535.c.jj 2025-03-31 10:36:18.844758921 > +0200 > +++ gcc/testsuite/c-c++-common/pr119535.c 2025-04-01 09:12:08.998801814 > +0200 > @@ -0,0 +1,31 @@ > +/* PR gcov-profile/119535 > +/* { dg-do compile { target musttail } } */ > +/* { dg-options "-fprofile-generate -O2" } */ > +/* { dg-require-profiling "-fprofile-generate" } */ > + > +[[gnu::noipa]] int > +foo (int x) > +{ > + return 42 + x; > +} > + > +int > +bar (int x) > +{ > + foo (x); > + foo (2); > + [[clang::musttail]] return foo (3); > +} > + > +int > +baz (int x) > +{ > + if (x == 42) > +return -1; > + else if (x == 15) > +return 25; > + else if (x == 26) > +[[clang::musttail]] return foo (4); > + else > +[[clang::musttail]] return foo (5); > +} > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Re: [PATCH] APX: add nf counterparts for rotl split pattern [PR 119539]
On Tue, Apr 1, 2025 at 4:40 PM Hongyu Wang wrote: > > Hi, > > For spiltter after 3_mask it now splits the pattern > to *3_mask, causing the splitter doesn't generate > nf variant. Add corresponding nf counterpart for define_insn_and_split > to make the splitter also works for nf insn. > > Bootstrapped & regtested on x86-64-pc-linux-gnu. > > Ok for trunk? > > gcc/ChangeLog: > > PR target/119539 > * config/i386/i386.md (*3_mask_nf): New > define_insn_and_split. > (*3_mask_1_nf): Likewise. > (*3_mask): Use force_lowpart_subreg. > > gcc/testsuite/ChangeLog: > > PR target/119539 > * gcc.target/i386/apx-nf-pr119539.c: New test. > --- > gcc/config/i386/i386.md | 46 ++- > .../gcc.target/i386/apx-nf-pr119539.c | 6 +++ > 2 files changed, 50 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/apx-nf-pr119539.c > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md > index f7f790d2aeb..42312f0c330 100644 > --- a/gcc/config/i386/i386.md > +++ b/gcc/config/i386/i386.md > @@ -18131,6 +18131,30 @@ (define_expand "3" >DONE; > }) > > +;; Avoid useless masking of count operand. > +(define_insn_and_split "*3_mask_nf" > + [(set (match_operand:SWI 0 "nonimmediate_operand") > + (any_rotate:SWI > + (match_operand:SWI 1 "nonimmediate_operand") > + (subreg:QI > + (and > + (match_operand 2 "int248_register_operand" "c") > + (match_operand 3 "const_int_operand")) 0)))] > + "TARGET_APX_NF > + && ix86_binary_operator_ok (, mode, operands) > + && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) > + == GET_MODE_BITSIZE (mode)-1 > + && ix86_pre_reload_split ()" > + "#" > + "&& 1" > + [(set (match_dup 0) > + (any_rotate:SWI (match_dup 1) > + (match_dup 2)))] > +{ > + operands[2] = force_lowpart_subreg (QImode, operands[2], > + GET_MODE (operands[2])); > +}) Can we just change the output in original pattern, I think combine will still match the pattern even w/ clobber flags. like @@ -17851,8 +17851,17 @@ (define_insn_and_split "*3_mask" (match_dup 2))) (clobber (reg:CC FLAGS_REG))])] { - operands[2] = force_reg (GET_MODE (operands[2]), operands[2]); - operands[2] = gen_lowpart (QImode, operands[2]); + if (TARGET_APX_F) +{ + emit_move_insn (operands[0], +gen_rtx_ (mode, operands[1], operands[2])); + DONE; +} + else +{ + operands[2] = force_reg (GET_MODE (operands[2]), operands[2]); + operands[2] = gen_lowpart (QImode, operands[2]); +} } Also we can remove constraint "c" in the original pattern. > + > ;; Avoid useless masking of count operand. > (define_insn_and_split "*3_mask" >[(set (match_operand:SWI 0 "nonimmediate_operand") > @@ -18153,8 +18177,8 @@ (define_insn_and_split "*3_mask" >(match_dup 2))) >(clobber (reg:CC FLAGS_REG))])] > { > - operands[2] = force_reg (GET_MODE (operands[2]), operands[2]); > - operands[2] = gen_lowpart (QImode, operands[2]); > + operands[2] = force_lowpart_subreg (QImode, operands[2], > + GET_MODE (operands[2])); > }) > > (define_split > @@ -18174,6 +18198,24 @@ (define_split > (and:SI (match_dup 2) (match_dup 3)) 0)))] > "operands[4] = gen_reg_rtx (mode);") > > +(define_insn_and_split "*3_mask_1_nf" > + [(set (match_operand:SWI 0 "nonimmediate_operand") > + (any_rotate:SWI > + (match_operand:SWI 1 "nonimmediate_operand") > + (and:QI > + (match_operand:QI 2 "register_operand" "c") > + (match_operand:QI 3 "const_int_operand"] > + "TARGET_APX_NF > + && ix86_binary_operator_ok (, mode, operands) > + && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) > + == GET_MODE_BITSIZE (mode)-1 > + && ix86_pre_reload_split ()" > + "#" > + "&& 1" > + [(set (match_dup 0) > + (any_rotate:SWI (match_dup 1) > + (match_dup 2)))]) > + > (define_insn_and_split "*3_mask_1" >[(set (match_operand:SWI 0 "nonimmediate_operand") > (any_rotate:SWI > diff --git a/gcc/testsuite/gcc.target/i386/apx-nf-pr119539.c > b/gcc/testsuite/gcc.target/i386/apx-nf-pr119539.c > new file mode 100644 > index 000..5dfec55ed76 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/apx-nf-pr119539.c > @@ -0,0 +1,6 @@ > +/* { dg-do compile { target { ! ia32 } } } */ > +/* { dg-options "-mapx-features=nf -march=x86-64 -O2" } */ > +/* { dg-final { scan-assembler-times "\{nf\} rol" 2 } } */ > + > +long int f1 (int x) { return ~(1ULL << (x & 0x3f)); } > +long int f2 (char x) { return ~(1ULL << (x & 0x3f)); } > -- > 2.31.1 > -- BR, Hongtao
Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language
Am Dienstag, dem 01.04.2025 um 15:01 + schrieb Qing Zhao: > > > On Apr 1, 2025, at 10:04, Martin Uecker wrote: > > > > > > > > Am Montag, dem 31.03.2025 um 13:59 -0700 schrieb Bill Wendling: > > > > I'd like to offer up this to solve the issues we're facing. This is a > > > > combination of everything that's been discussed here (or at least that > > > > I've been able to read in the centi-thread :-). > > > > Thanks! I think this proposal much better as it avoids undue burden > > on parsers, but it does not address all my concerns. > > > > > > From my side, the issue about compromising the scoping rules of C > > is also about unintended non-local effects of code changes. In > > my opinion, a change in a library header elsewhere should not cause > > code in a local scope (which itself might also come from a macro) to > > emit a warning or require a programmer to add a workaround. So I am > > not convinced that adding warnings or a workaround such as > > __builtin_global_ref is a good solution. > > > > > > I could see the following as a possible way forward: We only > > allow the following two syntaxes: > > > > 1. Single argument referring to a member. > > > > __counted_by(len) > > > > with an argument that must be a single identifier and where > > the identifier then must refer to a struct member. > > > > (I still think this is not ideal and potentially > > confusing, but in contrast to new scoping rules it is > > at least relatively easily to explain as a special rule.). > > > > > > 2. Forward declarations. > > > > __counted_by(size_t len; len + PADDING) > > In the above, the PADDING is some constant? In principle - when considering only the name lookup rules - it could be a constant, a global variable, or an automatic variable, i.e. any ordinary identifiers which is visible at this point. > > More complicated expressions involving globals will not be supported? I think one could allow such expressions, But I think the expressions should be restricted to expressions which have no side effects. > > > where then the second part can also be a more complicated > > expression, but with the explicit requirement that all > > identifiers in this expression are then looked up according to > > regular C language rules. So except for the forward declared > > member(s) they are *never* looked up in the member namespace of > > the struct, i.e. no new name lookup rules are introduced. > > One question here: > > What’s the major issue if we’d like to add one new scoping rule, for example, > “Structure scope” (the same as the C++’s instance scope) to C? > > (In addition to the "VLA in structure" issue I mentioned in my previous > writeup, > is there any other issue to prevent this new scoping rule being added into C > ?). Note that the "VLA in structure" is a bit of a red herring. The exact same issues apply to lookup of any other ordinary identifiers in this context. enum { MY_BUF_SIZE = 100 }; struct foo { char buf[MY_BUF_SIZE]; }; C++ has instance scope for member functions. The rules for C++ are also complex and not very consistent (see the examples I posted earlier, demonstrating UB and compiler divergence). For C such a concept would be new and much less useful, so the trade-off seems unfavorable (in constrast to C++ where it is needed). I also see others issues: Fully supporting instance scope would require changes to how C is parsed, placing a burden on all C compilers and tooling. Depending on how you specify it, it would also cause a change in semantics for existing code, something C tries very hard to avoid. If you add warnings as mitigation, it has the problem that it causes non-local effects where introducing a name in in enclosing scope somewhere else now necessitates a change to unrelated code, exactly what scoping rules are meant to prevent. In any case, it seems a major change with many ramifications, including possibly unintended ones. This should certainly not be done without having a clear specification and support from WG14 (and probably not done at all.) Martin > > Qing > > > > > > > > I think this could address my concerns about breaking > > scoping in C. Still, I personally would prefer designator syntax > > for both C and C++ as a nicer solution, and one that already > > has some support from WG14. > > > > Martin > > > > > > > > > > > > --- > > > > > > > > 1. The use of '__self' isn't feasible, so we won't use it. Instead, > > > > we'll rely upon the current behavior—resolving any identifiers to the > > > > "instance scope". This new scope is used __only__ in attributes, and > > > > resolves identifiers to those in the least enclosing, non-anonymous > > > > struct. For example: > > > > > > > > struct foo { > > > > char count; > > > > struct bar { > > > >struct { > > > > int len; > > > >}; > > > >struct { > > > > struct { > > > >int *valid_use __counted_by(len); // Valid. > > > > }; > > > >
[PATCH] combine: Allow 2->2 combos but limit insn searches [PR116398]
The problem in PR101523 was that, after each successful 2->2 combination attempt, distribute_links would search further and further for the next combinable use of the i2 destination. The original patch for the PR dealt with that by disallowing such combinations. However, that led to various optimisation regressions, so there was interest in allowing the combinations again, at least until an alternative way of getting the same results is in place. This patch does that, but limits distribute_links to a certain number of instructions when i2 is unchanged. As Segher said in the PR trail, it would make more conceptual sense to apply the limit unconditionally, but I thought it would be better to change as little as possible at this development stage. Logically, in stage 1, the --param should be applied directly by distribute_links with no input from callers. As I mentioned in: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116398#c28 I think it's safe to drop log links even if a use exists. All processing of log links seems to handle the absence of a link for a particular register in a conservative way. The initial set-up errs on the side of dropping links, since for example create_log_links has: /* flow.c claimed: We don't build a LOG_LINK for hard registers contained in ASM_OPERANDs. If these registers get replaced, we might wind up changing the semantics of the insn, even if reload can make what appear to be valid assignments later. */ if (regno < FIRST_PSEUDO_REGISTER && asm_noperands (PATTERN (use_insn)) >= 0) continue; which excludes combinations by dropping log links, rather than during try_combine. And: /* If this register is being initialized using itself, and the register is uninitialized in this basic block, and there are no LOG_LINKS which set the register, then part of the register is uninitialized. In that case we can't assume anything about the number of nonzero bits. ??? We could do better if we checked this in reg_{nonzero_bits,num_sign_bit_copies}_for_combine. Then we could avoid making assumptions about the insn which initially sets the register, while still using the information in other insns. We would have to be careful to check every insn involved in the combination. */ if (insn && reg_referenced_p (x, PATTERN (insn)) && !REGNO_REG_SET_P (DF_LR_IN (BLOCK_FOR_INSN (insn)), REGNO (x))) { struct insn_link *link; FOR_EACH_LOG_LINK (link, insn) if (dead_or_set_p (link->insn, x)) break; if (!link) { rsp->nonzero_bits = GET_MODE_MASK (mode); rsp->sign_bit_copies = 1; return; } } treats the lack of a log link is a possible sign of uninitialised data, but that would be a missed optimisation rather than a correctness issue. One question is what the default --param value should be. I went with Jakub's suggestion of 3000 from: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116398#c25 Also, to answer Jakub's question in that comment, I tried bisecting: int limit = atoi (getenv ("BISECT")); (so applying the limit for all calls from try_combine) with an abort in distribute_links if the limit caused a link to be skipped. The minimum BISECT value that allowed an aarch64-linux-gnu bootstrap to succeed with --enable-languages=all --enable-checking=yes,rtl,extra was 142, so much lower than the parameter value. I realised too late that --enable-checking=release would probably have been a more interesting test. Some of the try_combine changes come from Richi's patch in: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523#c53 Bootstrapped & regression-tested on aarch64-linux-gnu, powerpc64le-linux-gnu and x86_64-linux-gnu. OK to install? Richard gcc/ PR testsuite/116398 * params.opt (-param=max-combine-search-insns=): New param. * doc/invoke.texi: Document it. * combine.cc (distribute_links): Add a limit parameter. (try_combine): Use the new param to limit distribute_links when i2 is unchanged. Return i3 rather than i2 if i2 is unchanged. gcc/testsuite/ * gcc.target/aarch64/popcnt-le-1.c: Account for commutativity of TST. * gcc.target/aarch64/popcnt-le-3.c: Likewise AND. * gcc.target/aarch64/sve/pred-not-gen-1.c: Revert previous patch. * gcc.target/aarch64/sve/pred-not-gen-4.c: Likewise. * gcc.target/aarch64/sve/var_stride_2.c: Likewise. * gcc.target/aarch64/sve/var_stride_4.c: Likewise. Co-authored-by: Richard Biener --- gcc/combine.cc| 30 +-- gcc/doc/invoke.texi | 9
[PATCH] profile: Another profiling musttail call fix [PR119535]
Hi! As the following testcase shows, EDGE_FAKE edges from musttail calls to EXIT aren't the only edges we should ignore, we need to ignore also edges created by the splitting of blocks for the EDGE_FAKE creation that point from the musttail calls to the fallthrough block, which typically does the return or with PHIs for the return value. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2025-04-01 Jakub Jelinek PR gcov-profile/119535 * profile.cc (branch_prob): Ignore any edges from bbs ending with musttail call, rather than only EDGE_FAKE edges from those to EXIT. * c-c++-common/pr119535.c: New test. --- gcc/profile.cc.jj 2025-03-27 09:29:37.112574359 +0100 +++ gcc/profile.cc 2025-03-31 10:24:39.593446405 +0200 @@ -1341,8 +1341,7 @@ branch_prob (bool thunk) ignored_edges++; } - /* Ignore fake edges after musttail calls. */ - if ((e->flags & EDGE_FAKE) - && e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)) + /* Ignore edges after musttail calls. */ + if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)) { gimple_stmt_iterator gsi = gsi_last_bb (e->src); gimple *stmt = gsi_stmt (gsi); --- gcc/testsuite/c-c++-common/pr119535.c.jj2025-03-31 10:36:18.844758921 +0200 +++ gcc/testsuite/c-c++-common/pr119535.c 2025-04-01 09:12:08.998801814 +0200 @@ -0,0 +1,31 @@ +/* PR gcov-profile/119535 +/* { dg-do compile { target musttail } } */ +/* { dg-options "-fprofile-generate -O2" } */ +/* { dg-require-profiling "-fprofile-generate" } */ + +[[gnu::noipa]] int +foo (int x) +{ + return 42 + x; +} + +int +bar (int x) +{ + foo (x); + foo (2); + [[clang::musttail]] return foo (3); +} + +int +baz (int x) +{ + if (x == 42) +return -1; + else if (x == 15) +return 25; + else if (x == 26) +[[clang::musttail]] return foo (4); + else +[[clang::musttail]] return foo (5); +} Jakub
[PATCH] config, toplevel, Darwin: Pass -B instead of -L to C++ commands.
Another misconfigure (Darwin-only) found by a combination of increases in system security contraints and the COBOL runtime. We will now fail any C++ configure test that needs the binary to run. Tested on x86_64-darwin17,21,24 and on x86_64, powerpc64le, aarch64-linux confirmed manually that the CXX_FOR_TARGET commands are unchanged for Linux. OK for trunk (and backports)? thanks, Iain --- 8< --- Darwin from 10.11 needs embedded rpaths to find the correct libraries at runtime. Recent increases in hardening have made it such that the dynamic loader will no longer fall back to using an installed libstdc++ when the (new) linked one is not found. This means we fail configure tests (that should pass) for runtimes that use C++. We can resolve this by passing '-B' to the C++ command lines instead of '-L' (-B implies -L on Darwin, but also causes a corresponding embedded rpath). ChangeLog: * configure: Regenerate. * configure.ac: Use -B instead of -L to specifiy the C++ runtime paths on Darwin. Signed-off-by: Iain Sandoe --- configure| 100 +-- configure.ac | 16 +++-- 2 files changed, 112 insertions(+), 4 deletions(-) diff --git a/configure b/configure index 036142a8d06..bf574efd1d8 100755 --- a/configure +++ b/configure @@ -19081,7 +19081,101 @@ $as_echo "pre-installed" >&6; } fi fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking where to find the target c++" >&5 +case $target in + *-*-darwin*) +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking where to find the target c++" >&5 +$as_echo_n "checking where to find the target c++... " >&6; } +if test "x${build}" != "x${host}" ; then + if expr "x$CXX_FOR_TARGET" : "x/" > /dev/null; then +# We already found the complete path +ac_dir=`dirname $CXX_FOR_TARGET` +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed in $ac_dir" >&5 +$as_echo "pre-installed in $ac_dir" >&6; } + else +# Canadian cross, just use what we found +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed" >&5 +$as_echo "pre-installed" >&6; } + fi +else + ok=yes + case " ${configdirs} " in +*" gcc "*) ;; +*) ok=no ;; + esac + case ,${enable_languages}, in +*,c++,*) ;; +*) ok=no ;; + esac + if test $ok = yes; then +# An in-tree tool is available and we can use it +CXX_FOR_TARGET='$$r/$(HOST_SUBDIR)/gcc/xg++ -B$$r/$(HOST_SUBDIR)/gcc/ -nostdinc++ `if test -f $$r/$(TARGET_SUBDIR)/libstdc++-v3/scripts/testsuite_flags; then $(SHELL) $$r/$(TARGET_SUBDIR)/libstdc++-v3/scripts/testsuite_flags --build-includes; else echo -funconfigured-libstdc++-v3 ; fi` -B$$r/$(TARGET_SUBDIR)/libstdc++-v3/src -B$$r/$(TARGET_SUBDIR)/libstdc++-v3/src/.libs -B$$r/$(TARGET_SUBDIR)/libstdc++-v3/libsupc++/.libs' +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: just compiled" >&5 +$as_echo "just compiled" >&6; } + elif expr "x$CXX_FOR_TARGET" : "x/" > /dev/null; then +# We already found the complete path +ac_dir=`dirname $CXX_FOR_TARGET` +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed in $ac_dir" >&5 +$as_echo "pre-installed in $ac_dir" >&6; } + elif test "x$target" = "x$host"; then +# We can use an host tool +CXX_FOR_TARGET='$(CXX)' +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: host tool" >&5 +$as_echo "host tool" >&6; } + else +# We need a cross tool +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed" >&5 +$as_echo "pre-installed" >&6; } + fi +fi + +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking where to find the target c++ for libstdc++" >&5 +$as_echo_n "checking where to find the target c++ for libstdc++... " >&6; } +if test "x${build}" != "x${host}" ; then + if expr "x$RAW_CXX_FOR_TARGET" : "x/" > /dev/null; then +# We already found the complete path +ac_dir=`dirname $RAW_CXX_FOR_TARGET` +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed in $ac_dir" >&5 +$as_echo "pre-installed in $ac_dir" >&6; } + else +# Canadian cross, just use what we found +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed" >&5 +$as_echo "pre-installed" >&6; } + fi +else + ok=yes + case " ${configdirs} " in +*" gcc "*) ;; +*) ok=no ;; + esac + case ,${enable_languages}, in +*,c++,*) ;; +*) ok=no ;; + esac + if test $ok = yes; then +# An in-tree tool is available and we can use it +RAW_CXX_FOR_TARGET='$$r/$(HOST_SUBDIR)/gcc/xgcc -shared-libgcc -B$$r/$(HOST_SUBDIR)/gcc -nostdinc++ -B$$r/$(TARGET_SUBDIR)/libstdc++-v3/src -B$$r/$(TARGET_SUBDIR)/libstdc++-v3/src/.libs -B$$r/$(TARGET_SUBDIR)/libstdc++-v3/libsupc++/.libs' +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: just compiled" >&5 +$as_echo "just compiled" >&6; } + elif expr "x$RAW_CXX_FOR_TARGET" : "x/" > /dev/null; then +# We already found the complete path +ac_dir=`dirname $RAW_CXX_FOR_TARGET` +{ $as_echo "$as_me:${as_lineno-$LINENO}:
[PATCH] tailc: Improve tail recursion handling [PR119493]
Hi! This is a partial step towards fixing that PR. For musttail recursive calls which have non-is_gimple_reg_type typed parameters, the only case we've handled was if the exact parameter was passed through (perhaps modified, but still the same PARM_DECL). That isn't necessary, we can copy the argument to the parameter as well (just need to watch for the use of the parameter in later arguments, say musttail recursive call which swaps 2 structure arguments). The patch attempts to play safe and punts if any of the parameters are addressable (like we do for all normal tail calls and tail recursions, except for musttail in the posted unreviewed patch). With this patch (at least when early inlining isn't done on not yet optimized body) inlining should see already tail recursion optimized body and will not have problems with SRA breaking musttail. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2025-04-01 Jakub Jelinek PR tree-optimization/119493 * tree-tailcall.cc (find_tail_calls): Don't punt on tail recusion if some arguments don't have is_gimple_reg_type, only punt if they have non-POD types, or volatile, or addressable. Set tailr_arg_needs_copy in those cases too. (eliminate_tail_call): Copy call arguments to params if they don't have is_gimple_reg_type, use temporaries if the argument is used later. (tree_optimize_tail_calls_1): Skip !is_gimple_reg_type tailr_arg_needs_copy parameters. Formatting fix. * gcc.dg/pr119493-1.c: New test. --- gcc/tree-tailcall.cc.jj 2025-03-28 10:49:30.0 +0100 +++ gcc/tree-tailcall.cc2025-03-31 12:31:01.717603446 +0200 @@ -672,19 +672,16 @@ find_tail_calls (basic_block bb, struct have a copyable type and the two arguments must have reasonably equivalent types. The latter requirement could be relaxed if we emitted a suitable type conversion statement. */ - if (!is_gimple_reg_type (TREE_TYPE (param)) + if (TREE_ADDRESSABLE (TREE_TYPE (param)) || !useless_type_conversion_p (TREE_TYPE (param), TREE_TYPE (arg))) break; - /* The parameter should be a real operand, so that phi node -created for it at the start of the function has the meaning -of copying the value. This test implies is_gimple_reg_type -from the previous condition, however this one could be -relaxed by being more careful with copying the new value -of the parameter (emitting appropriate GIMPLE_ASSIGN and -updating the virtual operands). */ - if (!is_gimple_reg (param)) + if (is_gimple_reg_type (TREE_TYPE (param)) + ? !is_gimple_reg (param) + : (!is_gimple_variable (param) +|| TREE_THIS_VOLATILE (param) +|| may_be_aliased (param))) break; } } @@ -934,9 +931,9 @@ find_tail_calls (basic_block bb, struct param = DECL_CHAIN (param), idx++) { tree ddef, arg = gimple_call_arg (call, idx); - if (is_gimple_reg (param) - && (ddef = ssa_default_def (cfun, param)) - && (arg != ddef)) + if (!is_gimple_reg (param) + || ((ddef = ssa_default_def (cfun, param)) + && arg != ddef)) bitmap_set_bit (tailr_arg_needs_copy, idx); } } @@ -1212,6 +1209,7 @@ eliminate_tail_call (struct tailcall *t, /* Add phi node entries for arguments. The ordering of the phi nodes should be the same as the ordering of the arguments. */ + auto_vec copies; for (param = DECL_ARGUMENTS (current_function_decl), idx = 0, gpi = gsi_start_phis (first); param; @@ -1220,6 +1218,35 @@ eliminate_tail_call (struct tailcall *t, if (!bitmap_bit_p (tailr_arg_needs_copy, idx)) continue; + if (!is_gimple_reg_type (TREE_TYPE (param))) + { + if (param == gimple_call_arg (stmt, idx)) + continue; + /* First check if param isn't used by any of the following +call arguments. If it is, we need to copy first to +a temporary and only after doing all the assignments copy it +to param. */ + size_t idx2 = idx + 1; + tree param2 = DECL_CHAIN (param); + for (; param2; param2 = DECL_CHAIN (param2), idx2++) + if (!is_gimple_reg_type (TREE_TYPE (param))) + { + tree base = get_base_address (gimple_call_arg (stmt, idx2)); + if (base == param) + break; + } + tree tmp = param; + if (param2) + { + tmp = create_tmp_var (TREE_TYPE (param)); + copies.safe_push (param); +
[COMMITTED] Doc: Document enum with underlying type extension [PR117689]
This is a C23/C++11 feature that is supported as an extension with earlier -std= options too, but was never previously documented. It interacts with the already-documented forward enum definition extension, so I have merged discussion of the two extensions into the same section. gcc/ChangeLog PR c/117689 * doc/extend.texi (Incomplete Enums): Rename to (Enum Extensions): This. Document support for specifying the underlying type of an enum as an extension in all earlier C and C++ standards. Document that a forward declaration with underlying type is not an incomplete type, and which dialects GCC supports that in. --- gcc/doc/extend.texi | 62 - 1 file changed, 50 insertions(+), 12 deletions(-) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index a49b1cdd1e3..76fb210060d 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -12943,7 +12943,7 @@ C and/or C++ standards, while others remain specific to GNU C. * Typeof:: @code{typeof}: referring to the type of an expression. * Offsetof::Special syntax for @code{offsetof}. * Alignment:: Determining the alignment of a function, type or variable. -* Incomplete Enums::@code{enum foo;}, with details to follow. +* Enum Extensions:: Forward declarations and specifying the underlying type. * Variadic Macros:: Macros with a variable number of arguments. * Conditionals::Omitting the middle operand of a @samp{?:} expression. * Case Ranges:: `case 1 ... 9' and such. @@ -13643,22 +13643,60 @@ If the operand of the @code{__alignof__} expression is a function, the expression evaluates to the alignment of the function which may be specified by attribute @code{aligned} (@pxref{Common Function Attributes}). -@node Incomplete Enums -@subsection Incomplete @code{enum} Types +@node Enum Extensions +@subsection Extensions to @code{enum} Type Declarations +@anchor{Incomplete Enums} +@cindex @code{enum} extensions +@cindex base type of an @code{enum} +@cindex underlying type of an @code{enum} +@cindex forward declaration of an @code{enum} +@cindex opaque @code{enum} types +@cindex incomplete @code{enum} types -You can define an @code{enum} tag without specifying its possible values. -This results in an incomplete type, much like what you get if you write -@code{struct foo} without describing the elements. A later declaration -that does specify the possible values completes the type. +The C23 and C++11 standards added new syntax to specify the underlying +type of an @code{enum} type. For example, +@smallexample +enum pet : unsigned char @{ CAT, DOG, ROCK @}; +@end smallexample + +In GCC, this feature is supported as an extension in all older +dialects of C and C++ as well. For C++ dialects before C++11, use +@option{-Wno-c++11-extensions} to silence the associated warnings. + +You can also forward-declare an @code{enum} type, without specifying +its possible values. The enumerators are supplied in a later +redeclaration of the type, which must match the underlying type of the +first declaration. + +@smallexample +enum pet : unsigned char; +static enum pet my_pet; +... +enum pet : unsigned char @{ CAT, DOG, ROCK @}; +@end smallexample + +Forward declaration of @code{enum} types with an explicit underlying +type is also a feature of C++11 that is supported as an extension by +GCC for all C dialects. However, it's not available in C++ dialects +prior to C++11. + +The C++ standard refers to a forward declaration of an @code{enum} +with an explicit underlying type as an @dfn{opaque type}. It is not +considered an incomplete type, since its size is known. That means +you can declare variables or allocate storage using the type before +the redeclaration, not just use pointers of that type. + +GCC has also traditionally supported forward declarations of +@code{enum} types that don't include an explicit underlying type +specification. This results in an incomplete type, much like what you +get if you write @code{struct foo} without describing the elements. You cannot allocate variables or storage using the type while it is incomplete. However, you can work with pointers to that type. -This extension may not be very useful, but it makes the handling of -@code{enum} more consistent with the way @code{struct} and @code{union} -are handled. - -This extension is not supported by GNU C++. +Forward-declaring an incomplete enum type without an explicit +underlying type is supported as an extension in all GNU C dialects, +but is not supported at all in GNU C++. @node Variadic Macros @subsection Macros with a Variable Number of Arguments. -- 2.34.1
[PATCH] testsuite: Add support for GCOV_UNDER_TEST
After commit r15-8947-g8ed2d5d219e999, which added new tests using gcov, the CI noticed failures because it was calling 'gcov' instead of $target-gcov. This is because the CI scripts override GXX_UNDER_TEST, but still run the testsuite in-tree, and gcc-transform-out-of-tree only depends on TESTING_IN_BUILD_TREE but the definition of GCOV uses GCC_UNDER_TEST, GXX_UNDER_TEST or GDC_UNDER_TEST (assuming their default values when TESTING_IN_BUILD_TREE). To handle such a case, this patch adds support for a new variable, GCOV_UNDER_TEST, which overrides the current behavior if defined, and has an effect similar to overriding GCC_UNDER_TEST etc... Unfortunately, the change needs to be duplicated in several places, which use either GCC_UNDER_TEST, GXX_UNDER_TEST or GDC_UNDER_TEST. Tested g++.dg/gcov/gcov.exp and now g++.dg/gcov/gcov-22.C passes (calling /bin/$target-gcov as instructed by the CI scripts). No change observed on gcc.misc-tests/gcov.exp, and I could not test gdc.dg/gcov.exp and gnat.dg/gcov/gcov.exp. gcc/testsuite/ChangeLog: * g++.dg/gcov/gcov.exp: Handle GCOV_UNDER_TEST. * gcc.misc-tests/gcov.exp: Likewise. * gdc.dg/gcov.exp: Likewise. * gnat.dg/gcov/gcov.exp: Likewise. --- gcc/testsuite/g++.dg/gcov/gcov.exp| 15 +++ gcc/testsuite/gcc.misc-tests/gcov.exp | 15 +++ gcc/testsuite/gdc.dg/gcov.exp | 15 +++ gcc/testsuite/gnat.dg/gcov/gcov.exp | 15 +++ 4 files changed, 44 insertions(+), 16 deletions(-) diff --git a/gcc/testsuite/g++.dg/gcov/gcov.exp b/gcc/testsuite/g++.dg/gcov/gcov.exp index 50f60c4a011..d85bf125d16 100644 --- a/gcc/testsuite/g++.dg/gcov/gcov.exp +++ b/gcc/testsuite/g++.dg/gcov/gcov.exp @@ -21,12 +21,19 @@ load_lib g++-dg.exp load_lib gcov.exp global GXX_UNDER_TEST +global GCOV_UNDER_TEST -# Find gcov in the same directory as $GXX_UNDER_TEST. -if { ![is_remote host] && [string match "*/*" [lindex $GXX_UNDER_TEST 0]] } { -set GCOV [file dirname [lindex $GXX_UNDER_TEST 0]]/[gcc-transform-out-of-tree gcov] +# Find gcov in the same directory as $GXX_UNDER_TEST. under +# GCOV_UNDER_TEST is defined. + +if ![info exists GCOV_UNDER_TEST] { +if { ![is_remote host] && [string match "*/*" [lindex $GXX_UNDER_TEST 0]] } { + set GCOV [file dirname [lindex $GXX_UNDER_TEST 0]]/[gcc-transform-out-of-tree gcov] +} else { + set GCOV [gcc-transform-out-of-tree gcov] +} } else { -set GCOV [gcc-transform-out-of-tree gcov] +set GCOV $GCOV_UNDER_TEST } # Initialize harness. diff --git a/gcc/testsuite/gcc.misc-tests/gcov.exp b/gcc/testsuite/gcc.misc-tests/gcov.exp index c8f20e1e70e..4278b625c0b 100644 --- a/gcc/testsuite/gcc.misc-tests/gcov.exp +++ b/gcc/testsuite/gcc.misc-tests/gcov.exp @@ -21,12 +21,19 @@ load_lib gcc-dg.exp load_lib gcov.exp global GCC_UNDER_TEST +global GCOV_UNDER_TEST -# For now find gcov in the same directory as $GCC_UNDER_TEST. -if { ![is_remote host] && [string match "*/*" [lindex $GCC_UNDER_TEST 0]] } { -set GCOV [file dirname [lindex $GCC_UNDER_TEST 0]]/[gcc-transform-out-of-tree gcov] +# For now find gcov in the same directory as $GCC_UNDER_TEST, unless +# GCOV_UNDER_TEST is defined.. + +if ![info exists GCOV_UNDER_TEST] { +if { ![is_remote host] && [string match "*/*" [lindex $GCC_UNDER_TEST 0]] } { + set GCOV [file dirname [lindex $GCC_UNDER_TEST 0]]/[gcc-transform-out-of-tree gcov] +} else { + set GCOV [gcc-transform-out-of-tree gcov] +} } else { -set GCOV [gcc-transform-out-of-tree gcov] +set GCOV $GCOV_UNDER_TEST } # Initialize harness. diff --git a/gcc/testsuite/gdc.dg/gcov.exp b/gcc/testsuite/gdc.dg/gcov.exp index a65dd34..3f0234e9f2f 100644 --- a/gcc/testsuite/gdc.dg/gcov.exp +++ b/gcc/testsuite/gdc.dg/gcov.exp @@ -21,12 +21,19 @@ load_lib gdc-dg.exp load_lib gcov.exp global GDC_UNDER_TEST +global GCOV_UNDER_TEST -# For now find gcov in the same directory as $GDC_UNDER_TEST. -if { ![is_remote host] && [string match "*/*" [lindex $GDC_UNDER_TEST 0]] } { -set GCOV [file dirname [lindex $GDC_UNDER_TEST 0]]/[gcc-transform-out-of-tree gcov] +# For now find gcov in the same directory as $GDC_UNDER_TEST, unless +# GCOV_UNDER_TEST is defined. + +if ![info exists GCOV_UNDER_TEST] { +if { ![is_remote host] && [string match "*/*" [lindex $GDC_UNDER_TEST 0]] } { + set GCOV [file dirname [lindex $GDC_UNDER_TEST 0]]/[gcc-transform-out-of-tree gcov] +} else { + set GCOV [gcc-transform-out-of-tree gcov] +} } else { -set GCOV [gcc-transform-out-of-tree gcov] +set GCOV $GCOV_UNDER_TEST } # Initialize harness. diff --git a/gcc/testsuite/gnat.dg/gcov/gcov.exp b/gcc/testsuite/gnat.dg/gcov/gcov.exp index 4fa887d5bad..d5977a85e8a 100644 --- a/gcc/testsuite/gnat.dg/gcov/gcov.exp +++ b/gcc/testsuite/gnat.dg/gcov/gcov.exp @@ -21,12 +21,19 @@ load_lib gnat-dg.exp load_lib gcov.exp global GCC_UNDER_TEST +global GCOV_UNDER_TEST -# For now find
Re: [PATCH] simplify-rtx: Fix shortcut for vector eq/ne
On Tue, Apr 01, 2025 at 01:16:32PM +0100, Richard Sandiford wrote: > gcc/ > * simplify-rtx.cc (simplify_const_relational_operation): Generalize > the constant checks in the fold-via-minus path to match the > INTEGRAL_MODE_P condition. LGTM. Jakub
[PATCH] OpenMP: Require target and/or targetsync init modifier [PR118965]
As noted in PR 118965, the initial interop implementation overlooked the requirement in the OpenMP spec that at least one of the "target" and "targetsync" modifiers is required in both the interop construct init clause and the declare variant append_args clause. Adding the check was fairly straightforward, but it broke about a gazillion existing test cases. In particular, things like "init (x, y)" which were previously accepted (and tested for being accepted) aren't supposed to be allowed by the spec, much less things like "init (target)" where target was previously interpreted as a variable name instead of a modifier. Since one of the effects of the change is that at least one modifier is always required, I found that deleting all the code that was trying to detect and handle the no-modifier case allowed for better diagnostics. gcc/c/ChangeLog PR middle-end/118965 * c-parser.cc (c_parser_omp_clause_init_modifiers): Adjust error message. (c_parser_omp_clause_init): Remove code for recognizing clauses without modifiers. Diagnose missing target/targetsync modifier. (c_finish_omp_declare_variant): Diagnose missing target/targetsync modifier. gcc/cp/ChangeLog PR middle-end/118965 * parser.cc (c_parser_omp_clause_init_modifiers): Adjust error message. (cp_parser_omp_clause_init): Remove code for recognizing clauses without modifiers. Diagnose missing target/targetsync modifier. (cp_finish_omp_declare_variant): Diagnose missing target/targetsync modifier. gcc/gfortran/ChangeLog PR middle-end/118965 * openmp.cc (gfc_parser_omp_clause_init_modifiers): Fix some inconsistent code indentation. Remove code for recognizing clauses without modifiers. Diagnose prefer_type without a following '('. Adjust error message for an unrecognized modifier. Diagnose missing target/targetsync modifier. (gfc_match_omp_init): Fix more inconsistent code indentation. gcc/testsuite/ChangeLog PR middle-end/118965 * c-c++-common/gomp/append-args-1.c: Add target/targetsync modifiers so tests do what they were previously supposed to do. Adjust expected output. * c-c++-common/gomp/append-args-7.c: Likewise. * c-c++-common/gomp/append-args-8.c: Likewise. * c-c++-common/gomp/append-args-9.c: Likewise. * c-c++-common/gomp/interop-1.c: Likewise. * c-c++-common/gomp/interop-2.c: Likewise. * c-c++-common/gomp/interop-3.c: Likewise. * c-c++-common/gomp/interop-4.c: Likewise. * c-c++-common/gomp/pr118965-1.c: New. * c-c++-common/gomp/pr118965-2.c: New. * g++.dg/gomp/append-args-1.C: Add target/targetsync modifiers and adjust expected output. * g++.dg/gomp/append-args-2.C: Likewise. * g++.dg/gomp/append-args-6.C: Likewise. * g++.dg/gomp/append-args-7.C: Likewise. * g++.dg/gomp/append-args-8.C: Likewise. * g++.dg/gomp/interop-5.C: Likewise. * gfortran.dg/gomp/append_args-1.f90: Add target/targetsync modifiers and adjust expected output. * gfortran.dg/gomp/append_args-2.f90: Likewise. * gfortran.dg/gomp/append_args-3.f90: Likewise. * gfortran.dg/gomp/append_args-4.f90: Likewise. * gfortran.dg/gomp/interop-1.f90: Likewise. * gfortran.dg/gomp/interop-2.f90: Likewise. * gfortran.dg/gomp/interop-3.f90: Likewise. * gfortran.dg/gomp/interop-4.f90: Likewise. * gfortran.dg/gomp/pr118965-1.f90: New. * gfortran.dg/gomp/pr118965-2.f90: New. --- gcc/c/c-parser.cc | 44 --- gcc/cp/parser.cc | 45 --- gcc/fortran/openmp.cc | 77 +- .../c-c++-common/gomp/append-args-1.c | 4 +- .../c-c++-common/gomp/append-args-7.c | 4 +- .../c-c++-common/gomp/append-args-8.c | 9 ++- .../c-c++-common/gomp/append-args-9.c | 7 +- gcc/testsuite/c-c++-common/gomp/interop-1.c | 78 +-- gcc/testsuite/c-c++-common/gomp/interop-2.c | 62 ++- gcc/testsuite/c-c++-common/gomp/interop-3.c | 26 +++ gcc/testsuite/c-c++-common/gomp/interop-4.c | 8 +- gcc/testsuite/c-c++-common/gomp/pr118965-1.c | 57 ++ gcc/testsuite/c-c++-common/gomp/pr118965-2.c | 31 gcc/testsuite/g++.dg/gomp/append-args-1.C | 6 +- gcc/testsuite/g++.dg/gomp/append-args-2.C | 6 +- gcc/testsuite/g++.dg/gomp/append-args-6.C | 4 +- gcc/testsuite/g++.dg/gomp/append-args-7.C | 2 +- gcc/testsuite/g++.dg/gomp/append-args-8.C | 9 +-- gcc/testsuite/g++.dg/gomp/interop-5.C | 8 +- .../gfortran.dg/gomp/append_args-1.f90| 8 +- .../gfortran.dg/gomp/append_args-2.f90| 24 +++--- .../gfortran.dg/gomp/append_args-3.f90| 2 +- .../gfortran.dg/gomp/
Re: [PATCH v2 0/4] Libsanitizer improvements
HTEC Proprietary Hi, Could you please provide an email address for upstream libsanitizer? We are not sure where to send these patches. Kind regards, Aleksandar Rakic From: Richard Biener Sent: Tuesday, April 1, 2025 2:50 PM To: Aleksandar Rakic Cc: gcc-patches@gcc.gnu.org; Djordje Todorovic; c...@mips.com; ja...@redhat.com Subject: Re: [PATCH v2 0/4] Libsanitizer improvements CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. On Mon, Mar 17, 2025 at 2:24 PM Aleksandar Rakic wrote: > > This patch series improves the libsanitizer for the mips target > in GCC. You should send these improvements to upstream libsanitizer instead. > These patches are cherry-picked from the mips_rel/11_2_0/master > and mips_rel/9_3_0/master branches from the MIPS' repository: > https://github.com/MIPS/gcc . > Further details on the individual changes are included in the > respective patches.
Re: [PATCH v3] RISC-V: Fix wrong LMUL when only implict zve32f.
On Mon, Mar 31, 2025 at 11:34 PM Robin Dapp wrote: > > >> Yeah...and I also don't like the magic "ceil(AVL / 2) ≤ vl ≤ VLMAX if > >> AVL < (2 * VLMAX)" rule... > > > > +1, spec has some description about this but I am not sure if I really get > > the point. > > > > From Spec: > > > > "For example, this permits an implementation to set vl = ceil(AVL > > / 2) for VLMAX < AVL < 2*VLMAX in order to evenly > > distribute work over the last two iterations of a stripmine loop. > > Requirement > > 2 ensures that the rst stripmine iteration of reduction > > loops uses the largest vector length of all iterations, even in the case of > > AVL < 2*VLMAX. This allows software to avoid needing to > > explicitly calculate a running maximum of vector lengths observed > > during a stripmined loop. Requirement 2 also allows an > > implementation to set vl to VLMAX for VLMAX < AVL < 2*VLMAX" > > Yeah, that's very unfortunate. > > The rule is something like > > if AVL >= 2 * VLMAX > vl = vsetvl = min (AVL, VLMAX) > > if VLMAX > AVL < 2 * VLMAX > vl = vsetvl = "whatever" ;) Note it's not quite "whatever" -- there is a constraint that vl be monotonically nonincreasing, which in some cases is the only important property. No denying this is an annoyance, though. > > if AVL <= VLMAX > vl = vsetvl = min (AVL, VLMAX) > > The idea of load balancing is alright I guess but it really complicates > matters > in the compiler. > > FWIW my plan for GCC 16 is to define a SELECT_VL_SANE (or any better name I > can > come up with) that doesn't have this behavior and always only performs a > minimum instead. This will allow us to perform scalar evolution on vsetvl > rather than giving up as we do right now. Microarchitectures where vsetvl > always behaves like a minimum would then enable the corresponding > expander/insn > and others would fall back to the current behavior. > > -- > Regards > Robin >
Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language
Am Dienstag, dem 01.04.2025 um 18:58 + schrieb Qing Zhao: > > > On Apr 1, 2025, at 11:28, Martin Uecker wrote: > > > > Am Dienstag, dem 01.04.2025 um 15:01 + schrieb Qing Zhao: > > > > > > > On Apr 1, 2025, at 10:04, Martin Uecker wrote: > > > > > > > > > > > > > > > > Am Montag, dem 31.03.2025 um 13:59 -0700 schrieb Bill Wendling: > > > > > > I'd like to offer up this to solve the issues we're facing. This is > > > > > > a > > > > > > combination of everything that's been discussed here (or at least > > > > > > that > > > > > > I've been able to read in the centi-thread :-). > > > > > > > > Thanks! I think this proposal much better as it avoids undue burden > > > > on parsers, but it does not address all my concerns. > > > > > > > > > > > > From my side, the issue about compromising the scoping rules of C > > > > is also about unintended non-local effects of code changes. In > > > > my opinion, a change in a library header elsewhere should not cause > > > > code in a local scope (which itself might also come from a macro) to > > > > emit a warning or require a programmer to add a workaround. So I am > > > > not convinced that adding warnings or a workaround such as > > > > __builtin_global_ref is a good solution. > > > > > > > > > > > > I could see the following as a possible way forward: We only > > > > allow the following two syntaxes: > > > > > > > > 1. Single argument referring to a member. > > > > > > > > __counted_by(len) > > > > > > > > with an argument that must be a single identifier and where > > > > the identifier then must refer to a struct member. > > > > > > > > (I still think this is not ideal and potentially > > > > confusing, but in contrast to new scoping rules it is > > > > at least relatively easily to explain as a special rule.). > > > > > > So, in allowed syntax 1, the identifier inside counted_by attribute will be > looked up inside > the structure. > > This is our current implementation of the counted_by for FAM and my previous > submitted > patch for counted_by for Pointers inside structures. > > Keeping this syntax is good. > > > > > > > > > 2. Forward declarations. > > > > > > > > __counted_by(size_t len; len + PADDING) > > > > > > In the above, the PADDING is some constant? > > > > In principle - when considering only the name lookup rules - > > it could be a constant, a global variable, or an automatic > > variable, i.e. any ordinary identifiers which is visible at > > this point. > > I am a little confused here: > Is this syntax 2 a new syntax, and with new name lookup rules other than the > syntax 1? Yes. With the regular C name lookup rules other than syntax 1. > > How should the identifiers inside counted_by attribute with this syntax be > looked up? > Inside the structure first? Then if not found, looking up the outer scope for > identifiers in the > PADDING part? The identifier in the forward declaration ("len") will be looked up in the structure and will be made available when parsing the expression. Any other identifiers (such as "PADDING") will not be looked up in the structure. So it is always clear where each identifier is going to be looked up. > Then, has a new scoping been introduced now? > Or some other special looking up rules for counted_by attribute? > > > > > > > > > More complicated expressions involving globals will not be supported? > > > > I think one could allow such expressions, But I think the > > expressions should be restricted to expressions which have > > no side effects. > > See my question in above, does this new syntax 2 introduce a new “structure > scope” to enable > the identifiers to be looked up inside the structure first as syntax 1? Or, > this new syntax has the > same lookup rule as the current C, will NOT look up inside the structure > first? It will NOT look into the structure, except for the forward declared identifier. Martin > > > > > > > > > > where then the second part can also be a more complicated > > > > expression, but with the explicit requirement that all > > > > identifiers in this expression are then looked up according to > > > > regular C language rules. So except for the forward declared > > > > member(s) they are *never* looked up in the member namespace of > > > > the struct, i.e. no new name lookup rules are introduced. > > > > > > One question here: > > > > > > What’s the major issue if we’d like to add one new scoping rule, for > > > example, > > > “Structure scope” (the same as the C++’s instance scope) to C? > > > > > > (In addition to the "VLA in structure" issue I mentioned in my previous > > > writeup, > > > is there any other issue to prevent this new scoping rule being added > > > into C ?). > > > > Note that the "VLA in structure" is a bit of a red herring. The exact same > > issues apply to lookup of any other ordinary identifiers in this context. > > > > enum { MY_BUF_SIZE = 100 }; > > struct foo { > > char buf[M
Re: [PATCH v3] RISC-V: Fix wrong LMUL when only implict zve32f.
Note it's not quite "whatever" -- there is a constraint that vl be monotonically nonincreasing, which in some cases is the only important property. No denying this is an annoyance, though. Yes, I was hoping the smiley would convey that "whatever" was not to be taken literally. In terms of SCEV analysis a minimum is much more helpful. And given our experience with the vector spec so far I have no doubt that there is already a design in the wild that does something other than a minimum in that case. -- Regards Robin
Re: [PATCH v2 0/4] Libsanitizer improvements
On Tue, Apr 1, 2025 at 12:02 PM Aleksandar Rakic wrote: > > HTEC Proprietary > > Hi, > > Could you please provide an email address for upstream libsanitizer? > We are not sure where to send these patches. It is mentioned in libsantizer/README.gcc : ``` Both tools consist of a compiler module and a run-time library. The sources of the run-time library for these projects are hosted at https://github.com/llvm/llvm-project in the following directories: ``` Thanks, Andrew > > Kind regards, > Aleksandar Rakic > > > From: Richard Biener > Sent: Tuesday, April 1, 2025 2:50 PM > To: Aleksandar Rakic > Cc: gcc-patches@gcc.gnu.org; Djordje Todorovic; c...@mips.com; > ja...@redhat.com > Subject: Re: [PATCH v2 0/4] Libsanitizer improvements > > CAUTION: This email originated from outside of the organization. Do not click > links or open attachments unless you recognize the sender and know the > content is safe. > > > On Mon, Mar 17, 2025 at 2:24 PM Aleksandar Rakic > wrote: > > > > This patch series improves the libsanitizer for the mips target > > in GCC. > > You should send these improvements to upstream libsanitizer instead. > > > These patches are cherry-picked from the mips_rel/11_2_0/master > > and mips_rel/9_3_0/master branches from the MIPS' repository: > > https://github.com/MIPS/gcc . > > Further details on the individual changes are included in the > > respective patches.
Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language
> On Apr 1, 2025, at 15:25, Martin Uecker wrote: > > Am Dienstag, dem 01.04.2025 um 18:58 + schrieb Qing Zhao: >> >>> On Apr 1, 2025, at 11:28, Martin Uecker wrote: >>> >>> Am Dienstag, dem 01.04.2025 um 15:01 + schrieb Qing Zhao: > On Apr 1, 2025, at 10:04, Martin Uecker wrote: > > > > Am Montag, dem 31.03.2025 um 13:59 -0700 schrieb Bill Wendling: >>> I'd like to offer up this to solve the issues we're facing. This is a >>> combination of everything that's been discussed here (or at least that >>> I've been able to read in the centi-thread :-). > > Thanks! I think this proposal much better as it avoids undue burden > on parsers, but it does not address all my concerns. > > > From my side, the issue about compromising the scoping rules of C > is also about unintended non-local effects of code changes. In > my opinion, a change in a library header elsewhere should not cause > code in a local scope (which itself might also come from a macro) to > emit a warning or require a programmer to add a workaround. So I am > not convinced that adding warnings or a workaround such as > __builtin_global_ref is a good solution. > > > I could see the following as a possible way forward: We only > allow the following two syntaxes: > > 1. Single argument referring to a member. > > __counted_by(len) > > with an argument that must be a single identifier and where > the identifier then must refer to a struct member. > > (I still think this is not ideal and potentially > confusing, but in contrast to new scoping rules it is > at least relatively easily to explain as a special rule.). > >> >> So, in allowed syntax 1, the identifier inside counted_by attribute will be >> looked up inside >> the structure. >> >> This is our current implementation of the counted_by for FAM and my previous >> submitted >> patch for counted_by for Pointers inside structures. >> >> Keeping this syntax is good. >> > > 2. Forward declarations. > > __counted_by(size_t len; len + PADDING) In the above, the PADDING is some constant? >>> >>> In principle - when considering only the name lookup rules - >>> it could be a constant, a global variable, or an automatic >>> variable, i.e. any ordinary identifiers which is visible at >>> this point. >> >> I am a little confused here: >> Is this syntax 2 a new syntax, and with new name lookup rules other than the >> syntax 1? > > Yes. With the regular C name lookup rules other than syntax 1. > >> >> How should the identifiers inside counted_by attribute with this syntax be >> looked up? >> Inside the structure first? Then if not found, looking up the outer scope >> for identifiers in the >> PADDING part? > > The identifier in the forward declaration ("len") will be looked > up in the structure and will be made available when parsing > the expression. Any other identifiers (such as "PADDING") > will not be looked up in the structure. So it is always > clear where each identifier is going to be looked up. Yeah, this sounds a good idea to me, and a nice compromise solution. -:) Then, if more than one members need to be in the expression, for example: int number; struct A { size_t count_1; size_t count_2; char *array __counted_by (size_t count_1; size_t count_2; count1 + count2 + number * 4) } i.e., all the members that will be in the counted_by expression should be declared first inside the counted_by, then all other variables in the expression could be looked up per the default scoping rule. Is the understanding correct? Qing > >> Then, has a new scoping been introduced now? >> Or some other special looking up rules for counted_by attribute? >> >>> More complicated expressions involving globals will not be supported? >>> >>> I think one could allow such expressions, But I think the >>> expressions should be restricted to expressions which have >>> no side effects. >> >> See my question in above, does this new syntax 2 introduce a new “structure >> scope” to enable >> the identifiers to be looked up inside the structure first as syntax 1? Or, >> this new syntax has the >> same lookup rule as the current C, will NOT look up inside the structure >> first? > > It will NOT look into the structure, except for the forward > declared identifier. > > > Martin > >> >>> > where then the second part can also be a more complicated > expression, but with the explicit requirement that all > identifiers in this expression are then looked up according to > regular C language rules. So except for the forward declared > member(s) they are *never* looked up in the member namespace of > the struct, i.e. no new name lookup rules are introduced. One question here: What’s the major issue if we’d like to
Re: [PATCH] [testsuite] [riscv] limit vwaddsub-1.c to rv64
On 4/1/25 8:03 AM, Kito Cheng wrote: On Tue, Apr 1, 2025 at 12:47 PM Jeff Law wrote: On 3/31/25 7:03 PM, Alexandre Oliva wrote: On Mar 31, 2025, Jeff Law wrote: I don't immediately see anything in this test or its history to indicate it's only supposed to work for rv64. It's the 64-bit integral argument rs1. Right, but ISTM we ought to be able to handle a vector of 64bit integral types, especially from the intrinsics interface. Probably the thing to do is see what the intrinsics docs say. Or maybe Kito knows offhand. Long story, fortunately we have some record on that: https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/9 https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/199 Thanks. While they don't address this issue specifically, Craig's comments seem to indicate a desire to support the intrinsics in these scenarios and the compiler has to deal with the 64bit integer problems on rv32. I don't think it changes anything for Alex, we still need to disable the test for rv32 as the expected sequence just isn't right in that case. jeff
Re: [PATCH] aarch64: Deprecate -march= for the month of April
On 01/04/2025 09:42, Kyrylo Tkachov wrote: > Hi all, > > As we're starting a new month, introduce a more appropriate -mapril= > to specify the compilation target instead. > This helps keep GCC more up to date with the passage of time. > > Bootstrapped and tested on aarch64-none-linux-gnu. > > Signed-off-by: Kyrylo Tkachov > > gcc/ > > * config/aarch64/aarch64.opt (mapril): Define. > * doc/invoke.texi (AArch64 Options): Document the above. > Deprecate -march=. > Nice one. But I think we'll need to wait until -may before committing this. R.
Re: [PATCH] c++: Fix typo in RAW_DATA_CST build_list_conv subsubconv hanling [PR119563]
On Tue, Apr 01, 2025 at 04:31:38PM +0200, Jakub Jelinek wrote: > Hi! > > The following testcase ICEs (the embed one actually doesn't but > dereferences random uninitialized pointer far after allocated memory) > because of a typo. In the RAW_DATA_CST handling of list conversion > where there are conversions to something other than > initializer_list<{{,un}signed ,}char>, the code now calls > implicit_conversion for all the RAW_DATA_CST elements and stores them > into subsubconvs array. > The next loop (done in a separate loop because subsubconvs[0] is > handled differently) attempts to do the > for (i = 0; i < len; ++i) > { > conversion *sub = subconvs[i]; > if (sub->rank > t->rank) > t->rank = sub->rank; > if (sub->user_conv_p) > t->user_conv_p = true; > if (sub->bad_p) > t->bad_p = true; > } > rank/user_conv_p/bad_p merging, but I mistyped the index, the loop > iterates with j iterator and i is subconvs index, so the loop effectively > doesn't do anything interesting except for merging from one of the > subsubconvs element, if lucky within the subsubconvs array, if unlucky > not even from inside of the array. > > The following patch fixes that. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? LGTM. > 2025-04-01 Andrew Pinski > Jakub Jelinek > > PR c++/119563 > * call.cc (build_list_conv): Fix a typo in loop gathering > summary information from subsubconvs. > > * g++.dg/cpp0x/pr119563.C: New test. > * g++.dg/cpp/embed-26.C: New test. > > --- gcc/cp/call.cc.jj 2025-03-06 11:08:20.657231238 +0100 > +++ gcc/cp/call.cc2025-04-01 11:25:14.030321320 +0200 > @@ -917,7 +917,7 @@ build_list_conv (tree type, tree ctor, i > t->rank = cr_exact; > for (j = 0; j < (unsigned) RAW_DATA_LENGTH (val); ++j) > { > - sub = subsubconvs[i]; > + sub = subsubconvs[j]; > if (sub->rank > t->rank) > t->rank = sub->rank; > if (sub->user_conv_p) > --- gcc/testsuite/g++.dg/cpp0x/pr119563.C.jj 2025-04-01 11:14:24.702394749 > +0200 > +++ gcc/testsuite/g++.dg/cpp0x/pr119563.C 2025-04-01 11:13:50.962866358 > +0200 > @@ -0,0 +1,79 @@ > +// PR c++/119563 > +// { dg-do run { target c++11 } } > +// { dg-options "-O2" } > + > +namespace std { > +template > +struct initializer_list { > +private: > + T *_M_array; > + decltype (sizeof 0) _M_len; > +public: > + constexpr decltype (sizeof 0) > + size () const noexcept { return _M_len; } > + constexpr const T * > + begin () const noexcept { return _M_array; } > + constexpr const T * > + end () const noexcept { return begin () + size (); } > +}; > +} > + > +struct A {} a; > + > +struct B { > + constexpr B (int x) : B (a, x) {} > + template > + constexpr B (A, T... x) : b(x...) {} > + int b; > +}; > + > +struct C { > + C (std::initializer_list x) > + { > +if (x.size () != 130 + 1024 + 130) > + __builtin_abort (); > +unsigned int i = 1, j = 0; > +for (auto a = x.begin (); a < x.end (); ++a) > + if (a->b != i) > + __builtin_abort (); > + else > + { > + if (j == 129 || j == 129 + 1024) > + i = 0; > + i = (i & 15) + 1; > + ++j; > + } > +c = true; > + } > + bool c; > +}; > + > +#define D 1 + 0, 2 + 0, 3 + 0, 4 + 0, 5 + 0, 6 + 0, 7 + 0, 8 + 0, \ > + 9 + 0, 10 + 0, 11 + 0, 12 + 0, 13 + 0, 14 + 0, 15 + 0, 16 + 0 > +#define E D, D, D, D, D, D, D, D > + > +C c { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, > + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, > + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, > + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, > + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, > + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, > + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, > + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, > + 1, 2, E, E, E, E, E, E, E, E, > + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, > + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, > + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, > + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, > + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, > + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, > + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, > + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, > + 1, 2 }; > + > +int > +main () > +{ > + if (!c.c) > +__builtin_abort (); > +} > --- gcc/testsuite/g++.dg/cpp/embed-26.C.jj2025-04-01 11:18:56.563595381 > +0200 > +++ gcc/testsuite/g++.dg/cpp/embed-26.C 2025-04-01 11:18:35.444890450 > +0200 > @@ -0,0 +1,63 @@ > +// PR c++/119563 > +// { dg-do run { target c++11 } } > +// { dg-options "-O2" } > + > +namespace std { > +template >
[PATCH] PR119482: Avoid mispredictions in bitmap_set_bit
From: Andi Kleen This isn't a regression, but it's a very simple patch with high performance improvement, so perhaps suitable in the current stage. --- bitmap_set_bit checks the original value of the bit to return it to the caller and then only writes the new value back if it changes. Most callers of bitmap_set_bit don't need the return value, but with the conditional store the CPU still has to predict it correctly since gcc doesn't know how to do that without APX on x86 (even though CMOV could do it with a dummy target). Really if-conversion should handle this case, but for now we can fix it. This simple patch improves runtime by 15% for the test case in the PR. Which is more than I expected given it only has ~1.44% of the cycles, but I guess the mispredicts caused some down stream effects. ../obj-fast/gcc/cc1plus-bitmap -std=gnu++20 -O2 pr119482.cc -quiet ran 1.15 ± 0.01 times faster than ../obj-fast/gcc/cc1plus -std=gnu++20 -O2 pr119482.cc -quiet At least with this test case the total number of branches decreases drastically. Even though the mispredict rate goes up slightly it is still a big win. $ perf stat -e branches,branch-misses,uncore_imc/cas_count_read/,uncore_imc/cas_count_write/ \ -a ../obj-fast/gcc/cc1plus -std=gnu++20 -O2 pr119482.cc -quiet -w Performance counter stats for 'system wide': 41,932,957,091 branches 686,117,623 branch-misses#1.64% of all branches 43,690.47 MiB uncore_imc/cas_count_read/ 12,362.56 MiB uncore_imc/cas_count_write/ 49.328633365 seconds time elapsed $ perf stat -e branches,branch-misses,uncore_imc/cas_count_read/,uncore_imc/cas_count_write/ \ -a ../obj-fast/gcc/cc1plus-bitmap -std=gnu++20 -O2 pr119482.cc -quiet -w Performance counter stats for 'system wide': 37,092,113,179 branches 663,641,708 branch-misses#1.79% of all branches 43,196.52 MiB uncore_imc/cas_count_read/ 12,369.33 MiB uncore_imc/cas_count_write/ 42.632458350 seconds time elapsed gcc/ChangeLog: PR middle-end/119482 * bitmap.cc (bitmap_set_bit): Write back value unconditionally --- gcc/bitmap.cc | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/gcc/bitmap.cc b/gcc/bitmap.cc index f5a64b495ab3..7744f8f8c2e4 100644 --- a/gcc/bitmap.cc +++ b/gcc/bitmap.cc @@ -969,8 +969,7 @@ bitmap_set_bit (bitmap head, int bit) if (ptr != 0) { bool res = (ptr->bits[word_num] & bit_val) == 0; - if (res) - ptr->bits[word_num] |= bit_val; + ptr->bits[word_num] |= bit_val; return res; } -- 2.49.0
Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language
On Tue, Apr 1, 2025 at 5:41 AM Michael Matz wrote: > > Hello, > > On Mon, 31 Mar 2025, Bill Wendling wrote: > > > 1. The use of '__self' isn't feasible, so we won't use it. > > That's a bold statement. How's that? The only thing I read here is that > the very spelling of "self" was objected to. So, call it _Self, > _Selfref, or something. Even _Whatever42 would be better. > > Lookup rule changes only for identifiers used in attributes (all of them? > some of them?) is ... well, not good language design? I think you used > the word "overreaching" upthread, correctly so. And that for wanting to > avoid the established member-lookup syntax A.B by way of "we won't use > it"? That seems a terrible tradeoff. I don't understand the push for > that approach, at all. > It's not the name but the use of the identifier itself. If we can't enforce using it in every situation it's no longer useful. This is something that Apple (and the Clang community in general) is very against doing (not the identifier, but requiring its use). I still agree that adding a new scoping rule is overreach, but I'm trying to compromise so that we don't have multiple compilers with competing syntaxes. And as Yeoul pointed out in her RFC, we already have a similar lookup scheme with the "guarded_by" attribute. -bw
[COMMITTED 16/35] gccrs: Fix ICE when array elements are not a value
From: Philip Herron We need to check for error_mark_node when doing adjustments from coercion sites otherwise we hit assetions as part of the coercion. That fixes the ICE but the reason for the error_mark_node is because the array element value. Fixes Rust-GCC#3567 gcc/rust/ChangeLog: * backend/rust-compile-expr.cc (CompileExpr::array_value_expr): add value chk for array expr gcc/testsuite/ChangeLog: * rust/compile/issue-3567.rs: New test. Signed-off-by: Philip Herron --- gcc/rust/backend/rust-compile-expr.cc| 11 +++ gcc/testsuite/rust/compile/issue-3567.rs | 4 2 files changed, 15 insertions(+) create mode 100644 gcc/testsuite/rust/compile/issue-3567.rs diff --git a/gcc/rust/backend/rust-compile-expr.cc b/gcc/rust/backend/rust-compile-expr.cc index 21f4ca1ba18..e4ab9f010b8 100644 --- a/gcc/rust/backend/rust-compile-expr.cc +++ b/gcc/rust/backend/rust-compile-expr.cc @@ -1902,6 +1902,14 @@ CompileExpr::array_value_expr (location_t expr_locus, for (auto &elem : elems.get_values ()) { tree translated_expr = CompileExpr::Compile (*elem, ctx); + if (translated_expr == error_mark_node) + { + rich_location r (line_table, expr_locus); + r.add_fixit_replace (elem->get_locus (), "not a value"); + rust_error_at (r, ErrorCode::E0423, "expected value"); + return error_mark_node; + } + constructor.push_back (translated_expr); indexes.push_back (i++); } @@ -2003,6 +2011,9 @@ HIRCompileBase::resolve_adjustements ( tree e = expression; for (auto &adjustment : adjustments) { + if (e == error_mark_node) + return error_mark_node; + switch (adjustment.get_type ()) { case Resolver::Adjustment::AdjustmentType::ERROR: diff --git a/gcc/testsuite/rust/compile/issue-3567.rs b/gcc/testsuite/rust/compile/issue-3567.rs new file mode 100644 index 000..021d9c27838 --- /dev/null +++ b/gcc/testsuite/rust/compile/issue-3567.rs @@ -0,0 +1,4 @@ +fn main() { +let _: &[i8] = &[i8]; +// { dg-error "expected value .E0423." "" { target *-*-* } .-1 } +} -- 2.49.0
Patch ping [PATCH] tailc: Don't fail musttail calls if they use or could use local arguments, instead warn [PR119376]
Hi! On Tue, Mar 25, 2025 at 08:34:10AM +0100, Jakub Jelinek wrote: > As discussed here and in bugzilla, [[clang::musttail]] attribute in clang > not just strongly asks for tail call or error, but changes behavior. > To quote: > https://clang.llvm.org/docs/AttributeReference.html#musttail > "The lifetimes of all local variables and function parameters end immediately > before the call to the function. This means that it is undefined behaviour > to pass a pointer or reference to a local variable to the called function, > which is not the case without the attribute. Clang will emit a warning in > common cases where this happens." > > The GCC behavior was just to error if we can't prove the musttail callee > could not have dereferenced escaped pointers to local vars or parameters > of the caller. That is still the case for variables with non-trivial > destruction (even in clang), like vars with C++ non-trivial destructors or > variables with cleanup attribute. > > The following patch changes the behavior to match that of clang, for all of > [[clang::musttail]], [[gnu::musttail]] and __attribute__((musttail)). > > clang 20 actually added warning for some cases of it in > https://github.com/llvm/llvm-project/pull/109255 > but it is under -Wreturn-stack-address warning. > > Now, gcc doesn't have that warning, but -Wreturn-local-addr instead, and > IMHO it is better to have this under new warnings, because this isn't about > returning local address, but about passing it to a musttail call, or maybe > escaping to a musttail call. And perhaps users will appreciate they can > control it separately as well. > > The patch introduces 2 new warnings. > -Wmusttail-local-addr > which is turn on by default and warns for the always dumb cases of passing > an address of a local variable or parameter to musttail call's argument. > And then > -Wmaybe-musttail-local-addr > which is only diagnosed if -Wmusttail-local-addr was not diagnosed and > diagnoses at most one (so that we don't emit 100s of warnings for one call > if 100s of vars can escape) case where an address of a local var could have > escaped to the musttail call. This is less severe, the code doesn't have > to be obviously wrong, so the warning is only enabled in -Wextra. > > And I've adjusted also the documentation for this change and addition of > new warnings. I'd like to ping the https://gcc.gnu.org/pipermail/gcc-patches/2025-March/679182.html patch. I know it is quite controversial and if clang wouldn't be the first to implement this I'd certainly not go that way; I am willing to change the warning option names or move the maybe one from -Wextra to -Wall if there is agreement on that. Unfortunately there seems to be quite a lot of code in the wild that uses this attribute already and without this patch GCC 15 will simply not be able to compile that (whether it is firefox (skia etc.), protobuf, ...). The only other option is IMHO to drop the musttail attribute support for GCC 15 or name it differently with different semantics. But not sure projects in the wild will like to annotate their calls with two different musttail like attributes if they satisfy behavior of both. Jakub
Re: [PATCH v2] RISC-V: vsetvl: skip abnormal edge on vsetvl insertion [PR119533]
On 4/1/25 12:15 PM, Vineet Gupta wrote: On 3/31/25 23:48, Heinrich Schuchardt wrote: On 3/30/25 01:49, Vineet Gupta wrote: changes since v2 - dump log sanfu --- vsetvl phase4 uses LCM guided info to insert VSETVL insns. It has an additional loop to insert missing vsetvls on certain edges. Currently it asserts/aborts on encountering EDGE_ABNORMAL. When enabling go frontend with V enabled, libgo build hits the assert. It seems to be safe to just skip the abnormal edge. Hello Vineet, Is there a test case where only following an abnormal edge between code blocks would require to call VSETVL? In the sketched code below following the exception would require VSETVL to be called while the edge from the try block to the finally block would not require this. try { for (..) { uint16_t vectorizable math if (condition) throw exception uint16_t vectorizable math } for (..) { uint8_t vectorizable math } } catch exception { } finally for (..) { uint8_t vectorizable math } } Yeah we are going to run testsuite with -fnon-call-exceptions to find such cases. But I'd argue, there is no need to optimize vsetvl for such esoteric cases (vs. code complexity trade-off). After all we'd just endup with an extra VSETVL in the finally block. I'd look at that skeleton code as a way to potentially trip this issue without needing golang. Jeff
[PATCH, V6] PR target/118541 - Do not generate unordered fp cmoves for IEEE compares on PowerPC
Bernhard Reutner-Fischer suggested some typos to the patch for 118551. Here is the changed patch. In bug PR target/118541 on power9, power10, and power11 systems, for the function: extern double __ieee754_acos (double); double __acospi (double x) { double ret = __ieee754_acos (x) / 3.14; return __builtin_isgreater (ret, 1.0) ? 1.0 : ret; } GCC currently generates the following code: Power9 Power10 and Power11 == === bl __ieee754_acos bl __ieee754_acos@notoc nop plfd 0,.LC0@pcrel addis 9,2,.LC2@toc@ha xxspltidp 12,1065353216 addi 1,1,32 addi 1,1,32 lfd 0,.LC2@toc@l(9) ld 0,16(1) addis 9,2,.LC0@toc@ha fdiv 0,1,0 ld 0,16(1) mtlr 0 lfd 12,.LC0@toc@l(9)xscmpgtdp 1,0,12 fdiv 0,1,0 xxsel 1,0,12,1 mtlr 0 blr xscmpgtdp 1,0,12 xxsel 1,0,12,1 blr This is because ifcvt.c optimizes the conditional floating point move to use the XSCMPGTDP instruction. However, the XSCMPGTDP instruction will generate an interrupt if one of the arguments is a signalling NaN and signalling NaNs can generate an interrupt. The IEEE comparison functions (isgreater, etc.) require that the comparison not raise an interrupt. The following patch changes the PowerPC back end so that ifcvt.c will not change the if/then test and move into a conditional move if the comparison is one of the comparisons that do not raise an error with signalling NaNs and -Ofast is not used. If a normal comparison is used or -Ofast is used, GCC will continue to generate XSCMPGTDP and XXSEL. For the following code: double ordered_compare (double a, double b, double c, double d) { return __builtin_isgreater (a, b) ? c : d; } /* Verify normal > does generate xscmpgtdp. */ double normal_compare (double a, double b, double c, double d) { return a > b ? c : d; } with the following patch, GCC generates the following for power9, power10, and power11: ordered_compare: fcmpu 0,1,2 fmr 1,4 bnglr 0 fmr 1,3 blr normal_compare: xscmpgtdp 1,1,2 xxsel 1,4,3,1 blr I have built bootstrap compilers on big endian power9 systems and little endian power9/power10 systems and there were no regressions. Can I check this patch into the GCC trunk, and after a waiting period, can I check this into the active older branches? 2025-04-01 Michael Meissner gcc/ PR target/118541 * config/rs6000/predicates.md (invert_fpmask_comparison_operator): Do not allow UNLT and UNLE unless -ffast-math. * config/rs6000/rs6000-protos.h (enum rev_cond_ordered): New enumeration. (rs6000_reverse_condition): Add argument. * config/rs6000/rs6000.cc (rs6000_reverse_condition): Do not allow ordered comparisons to be reversed for floating point conditional moves, but allow ordered comparisons to be reversed on jumps. (rs6000_emit_sCOND): Adjust rs6000_reverse_condition call. * config/rs6000/rs6000.h (REVERSE_CONDITION): Likewise. * config/rs6000/rs6000.md (reverse_branch_comparison): Name insn. Adjust rs6000_reverse_condition calls. gcc/testsuite/ PR target/118541 * gcc.target/powerpc/pr118541-1.c: New test. * gcc.target/powerpc/pr118541-2.c: Likewise. * gcc.target/powerpc/pr118541-3.c: Likewise. * gcc.target/powerpc/pr118541-4.c: Likewise. --- gcc/config/rs6000/predicates.md | 10 +++- gcc/config/rs6000/rs6000-protos.h | 17 ++- gcc/config/rs6000/rs6000.cc | 46 ++- gcc/config/rs6000/rs6000.h| 10 +++- gcc/config/rs6000/rs6000.md | 25 ++ gcc/testsuite/gcc.target/powerpc/pr118541-1.c | 28 +++ gcc/testsuite/gcc.target/powerpc/pr118541-2.c | 26 +++ gcc/testsuite/gcc.target/powerpc/pr118541-3.c | 26 +++ gcc/testsuite/gcc.target/powerpc/pr118541-4.c | 26 +++ 9 files changed, 190 insertions(+), 24 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr118541-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pr118541-2.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pr118541-3.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pr118541-4.c diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index 647e89afb6a..ba8df6a7979 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -1466,8 +146
[COMMITTED] Doc: Document _Bool type as C90 extension [PR118118]
gcc/ChangeLog PR c/118118 * doc/extend.texi (Boolean Type): New section. --- gcc/doc/extend.texi | 12 1 file changed, 12 insertions(+) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 76fb210060d..d2bf6048be7 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -12944,6 +12944,7 @@ C and/or C++ standards, while others remain specific to GNU C. * Offsetof::Special syntax for @code{offsetof}. * Alignment:: Determining the alignment of a function, type or variable. * Enum Extensions:: Forward declarations and specifying the underlying type. +* Boolean Type:: Support for the @code{_Bool} keyword. * Variadic Macros:: Macros with a variable number of arguments. * Conditionals::Omitting the middle operand of a @samp{?:} expression. * Case Ranges:: `case 1 ... 9' and such. @@ -13698,6 +13699,17 @@ Forward-declaring an incomplete enum type without an explicit underlying type is supported as an extension in all GNU C dialects, but is not supported at all in GNU C++. +@node Boolean Type +@subsection Support for the @code{_Bool} Type +@cindex boolean type +@cindex @code{_Bool} keyword + +The C99 standard added @code{_Bool} as a C language keyword naming the +boolean type. As an extension, GNU C also recognizes @code{_Bool} in +C90 mode as well as with @option{-std=c99} and later. + +GNU C++ does not support the @code{_Bool} keyword. + @node Variadic Macros @subsection Macros with a Variable Number of Arguments. @cindex variable number of arguments -- 2.34.1
[COMMITTED] Doc: Cross-reference constructor and init_priority attributes [PR118982]
Per the issue, the discussion of these two attributes needed to be better integrated. I also did some editing for style and readability, and clarified that almost all targets support this feature (it is enabled by default unless the back end disables it), not just "some". Co-Authored_by: Jonathan Wakely gcc/ChangeLog PR c++/118982 * doc/extend.texi (Common Function Attributes): For the constructor/destructory attribute, be more explicit about the relationship between the constructor attribute and the C++ init_priority attribute, and add a cross-reference. Also document that most targets support this. (C++ Attributes): Similarly for the init_priority attribute. --- gcc/doc/extend.texi | 49 - 1 file changed, 31 insertions(+), 18 deletions(-) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index d2bf6048be7..48c3aeb1318 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -2035,24 +2035,32 @@ called. Functions with these attributes are useful for initializing data that is used implicitly during the execution of the program. -On some targets the attributes also accept an integer argument to +On most targets the attributes also accept an integer argument to specify a priority to control the order in which constructor and -destructor functions are run. A constructor -with a smaller priority number runs before a constructor with a larger -priority number; the opposite relationship holds for destructors. Note -that priorities 0-100 are reserved. So, if you have a constructor that +destructor functions are run. The @var{priority} argument is a +constant integral expression bounded between 101 and 65535 inclusive; +priorities 0-100 are reserved for use by the compiler and its runtime +libraries. +A constructor with a smaller priority number runs before a constructor with +a larger priority number; the opposite relationship holds for destructors. +So, if you have a constructor that allocates a resource and a destructor that deallocates the same -resource, both functions typically have the same priority. The -priorities for constructor and destructor functions are the same as -those specified for namespace-scope C++ objects (@pxref{C++ Attributes}). -However, at present, the order in which constructors for C++ objects -with static storage duration and functions decorated with attribute -@code{constructor} are invoked is unspecified. In mixed declarations, -attribute @code{init_priority} can be used to impose a specific ordering. +resource, both functions typically have the same priority. -Using the argument forms of the @code{constructor} and @code{destructor} -attributes on targets where the feature is not supported is rejected with -an error. +The order in which constructors for C++ objects with static storage +duration are invoked relative to functions decorated with attribute +@code{constructor} is normally unspecified. You can use +attribute @code{init_priority} (@pxref{C++ Attributes}) on the +declarations of namespace-scope C++ objects to impose a specific +ordering; the @var{priority} for the @code{init_priority} attribute +has the same effect as the @var{priority} for the @code{constructor} +attribute. + +Using the argument form of the @code{constructor} and +@code{destructor} attributes on targets where the feature is not +supported is rejected with an error. Only a few targets (typically +those not using ELF object format, or the GNU linker) reject this +usage. @cindex @code{copy} function attribute @item copy @@ -30291,11 +30299,11 @@ variable or function or moving it into a tagged inline namespace. In Standard C++, objects defined at namespace scope are guaranteed to be initialized in an order in strict accordance with that of their definitions @emph{in a given translation unit}. No guarantee is made for initializations -across translation units. However, GNU C++ allows users to control the +across translation units. However, GNU C++ allows you to control the order of initialization of objects defined at namespace scope with the @code{init_priority} attribute by specifying a relative @var{priority}, -a constant integral expression currently bounded between 101 and 65535 -inclusive. Lower numbers indicate a higher priority. +with the same meaning as for the @code{constructor} attribute +(@pxref{Common Function Attributes}). In the following example, @code{A} would normally be created before @code{B}, but the @code{init_priority} attribute reverses that order: @@ -30309,6 +30317,11 @@ Some_Class B __attribute__ ((init_priority (543))); Note that the particular values of @var{priority} do not matter; only their relative ordering. +As with the @var{priority} argument to the @code{constructor} and +@code{destructor} attributes, a few targets do not support the +@code{init_priority} attribute. In that case the attribute is rejected +with an error rat
Re: Patch ping [PATCH] tailc: Don't fail musttail calls if they use or could use local arguments, instead warn [PR119376]
> I'd like to ping the > https://gcc.gnu.org/pipermail/gcc-patches/2025-March/679182.html > patch. > I know it is quite controversial and if clang wouldn't be the first > to implement this I'd certainly not go that way; I am willing to change > the warning option names or move the maybe one from -Wextra to -Wall > if there is agreement on that. Unfortunately there seems to be > quite a lot of code in the wild that uses this attribute already > and without this patch GCC 15 will simply not be able to compile that > (whether it is firefox (skia etc.), protobuf, ...). FWIW the patch looks good to me. > > The only other option is IMHO to drop the musttail attribute support > for GCC 15 or name it differently with different semantics. > But not sure projects in the wild will like to annotate their calls > with two different musttail like attributes if they satisfy behavior > of both. Projects that are serious about the performance will also need to use __attribute__((no_callee_saved_registers)) and that is different than the clang equivalent. So some ifdefery is needed anyways. -Andi
[COMMITTED 09/35] gccrs: nr2.0: Fix test const_generics_3.rs
From: Owen Avery gcc/testsuite/ChangeLog: * rust/compile/const_generics_3.rs: Modify test to run with name resolution 2.0 only and to handle the absence of a bogus resolution error. * rust/compile/nr2/exclude: Remove const_generics_3.rs. Signed-off-by: Owen Avery --- gcc/testsuite/rust/compile/const_generics_3.rs | 8 +--- gcc/testsuite/rust/compile/nr2/exclude | 1 - 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/gcc/testsuite/rust/compile/const_generics_3.rs b/gcc/testsuite/rust/compile/const_generics_3.rs index e4e9008c4a6..524d48d5bcf 100644 --- a/gcc/testsuite/rust/compile/const_generics_3.rs +++ b/gcc/testsuite/rust/compile/const_generics_3.rs @@ -1,10 +1,12 @@ -// { dg-additional-options "-w" } +// { dg-additional-options "-w -frust-name-resolution-2.0" } + +#[lang = "sized"] +trait Sized {} const M: usize = 4; struct Foo { -// FIXME: This error is bogus. But having it means parsing is valid! -value: [i32; N], // { dg-error "cannot find value .N. in this scope" } +value: [T; N], } fn main() { diff --git a/gcc/testsuite/rust/compile/nr2/exclude b/gcc/testsuite/rust/compile/nr2/exclude index 45e90a4df93..71c2b682bf0 100644 --- a/gcc/testsuite/rust/compile/nr2/exclude +++ b/gcc/testsuite/rust/compile/nr2/exclude @@ -1,6 +1,5 @@ canonical_paths1.rs cfg1.rs -const_generics_3.rs generics9.rs issue-2043.rs issue-2330.rs -- 2.49.0
Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language
> On Apr 1, 2025, at 11:28, Martin Uecker wrote: > > Am Dienstag, dem 01.04.2025 um 15:01 + schrieb Qing Zhao: >> >>> On Apr 1, 2025, at 10:04, Martin Uecker wrote: >>> >>> >>> >>> Am Montag, dem 31.03.2025 um 13:59 -0700 schrieb Bill Wendling: > I'd like to offer up this to solve the issues we're facing. This is a > combination of everything that's been discussed here (or at least that > I've been able to read in the centi-thread :-). >>> >>> Thanks! I think this proposal much better as it avoids undue burden >>> on parsers, but it does not address all my concerns. >>> >>> >>> From my side, the issue about compromising the scoping rules of C >>> is also about unintended non-local effects of code changes. In >>> my opinion, a change in a library header elsewhere should not cause >>> code in a local scope (which itself might also come from a macro) to >>> emit a warning or require a programmer to add a workaround. So I am >>> not convinced that adding warnings or a workaround such as >>> __builtin_global_ref is a good solution. >>> >>> >>> I could see the following as a possible way forward: We only >>> allow the following two syntaxes: >>> >>> 1. Single argument referring to a member. >>> >>> __counted_by(len) >>> >>> with an argument that must be a single identifier and where >>> the identifier then must refer to a struct member. >>> >>> (I still think this is not ideal and potentially >>> confusing, but in contrast to new scoping rules it is >>> at least relatively easily to explain as a special rule.). >>> So, in allowed syntax 1, the identifier inside counted_by attribute will be looked up inside the structure. This is our current implementation of the counted_by for FAM and my previous submitted patch for counted_by for Pointers inside structures. Keeping this syntax is good. >>> >>> 2. Forward declarations. >>> >>> __counted_by(size_t len; len + PADDING) >> >> In the above, the PADDING is some constant? > > In principle - when considering only the name lookup rules - > it could be a constant, a global variable, or an automatic > variable, i.e. any ordinary identifiers which is visible at > this point. I am a little confused here: Is this syntax 2 a new syntax, and with new name lookup rules other than the syntax 1? How should the identifiers inside counted_by attribute with this syntax be looked up? Inside the structure first? Then if not found, looking up the outer scope for identifiers in the PADDING part? Then, has a new scoping been introduced now? Or some other special looking up rules for counted_by attribute? > >> >> More complicated expressions involving globals will not be supported? > > I think one could allow such expressions, But I think the > expressions should be restricted to expressions which have > no side effects. See my question in above, does this new syntax 2 introduce a new “structure scope” to enable the identifiers to be looked up inside the structure first as syntax 1? Or, this new syntax has the same lookup rule as the current C, will NOT look up inside the structure first? > >> >>> where then the second part can also be a more complicated >>> expression, but with the explicit requirement that all >>> identifiers in this expression are then looked up according to >>> regular C language rules. So except for the forward declared >>> member(s) they are *never* looked up in the member namespace of >>> the struct, i.e. no new name lookup rules are introduced. >> >> One question here: >> >> What’s the major issue if we’d like to add one new scoping rule, for example, >> “Structure scope” (the same as the C++’s instance scope) to C? >> >> (In addition to the "VLA in structure" issue I mentioned in my previous >> writeup, >> is there any other issue to prevent this new scoping rule being added into C >> ?). > > Note that the "VLA in structure" is a bit of a red herring. The exact same > issues apply to lookup of any other ordinary identifiers in this context. > > enum { MY_BUF_SIZE = 100 }; > struct foo { > char buf[MY_BUF_SIZE]; > }; > Yes, this is because there is NO “structure scope” available in C. As long as the “structure scope” is added into C, identifiers could be looked up inside the “structure scope” first before looking up outer scopes. > > C++ has instance scope for member functions. The rules for C++ are also > complex and not very consistent (see the examples I posted earlier, > demonstrating UB and compiler divergence). Yes, I studied those C++ examples when I wrote the proposal. And my observation was: in C++, the instance scope always has higher priority than local and global scopes. i.e, when there is a conflict between instance scope and local/global scope for the identifier, The identifier within the instance scope will shadow the one with the same name in the outer scope. But in C, there is No concept of “structure scope” at all. Identifiers will NOT looked u
Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language
On Tue, Apr 01, 2025 at 05:13:46PM -0700, Bill Wendling wrote: > On Tue, Apr 1, 2025 at 8:29 AM Martin Uecker wrote: > > Am Dienstag, dem 01.04.2025 um 15:01 + schrieb Qing Zhao: > > > > On Apr 1, 2025, at 10:04, Martin Uecker wrote: > > > > Am Montag, dem 31.03.2025 um 13:59 -0700 schrieb Bill Wendling: > > > > > > I'd like to offer up this to solve the issues we're facing. This is > > > > > > a > > > > > > combination of everything that's been discussed here (or at least > > > > > > that > > > > > > I've been able to read in the centi-thread :-). > > > > > > > > Thanks! I think this proposal much better as it avoids undue burden > > > > on parsers, but it does not address all my concerns. > > > > > > > > > > > > From my side, the issue about compromising the scoping rules of C > > > > is also about unintended non-local effects of code changes. In > > > > my opinion, a change in a library header elsewhere should not cause > > > > code in a local scope (which itself might also come from a macro) to > > > > emit a warning or require a programmer to add a workaround. So I am > > > > not convinced that adding warnings or a workaround such as > > > > __builtin_global_ref is a good solution. > > To clarify, I'm not in favor of adding a generalized new scoping rule > to C, but only for identifiers in attributes. From what I've seen in There are existing attributes which support general expressions in their arguments, e.g. the assume attribute, OpenMP attributes etc. Those definitely shouldn't change behavior if some new behavior for identifier lookup in attributes is added. Many others do support identifiers as their arguments (e.g. mode attribute, 2 argument malloc attribute, ...). Those can't change behavior either. I think using either a predefined identifier or self-declared one is the only reasonable way to go (whether it represents something like this pointer in C++ or in the latter case possibly forward declaration of the member), otherwise it will be hard to understand for users and very much error prone. Jakub
Re: [PATCH] RISC-V: xtheadmemidx: Split slli.uw pattern [combine question]
Segher -- there's a combine question near the end... On 3/23/25 8:43 PM, Bohan Lei wrote: The combine pass can generate an index like (and:DI (mult:DI (reg:DI) (const_int scale)) (const_int mask)) when XTheadMemIdx is available. LRA may pull it out, and thus a splitter is needed when Zba is not available. A similar splitter were introduced when XTheadMemIdx support was added, but removed in commit 31c3c5d. The new splitter in this new patch is based on the removed one. gcc/ChangeLog: * config/riscv/thead.md (*th_memidx_operand): New splitter. gcc/testsuite/ChangeLog: * gcc.target/riscv/xtheadmemidx-bug.c: New test. --- gcc/config/riscv/thead.md | 21 +++ .../gcc.target/riscv/xtheadmemidx-bug.c | 13 2 files changed, 34 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmemidx-bug.c diff --git a/gcc/config/riscv/thead.md b/gcc/config/riscv/thead.md index d816f3b86dd..20e82e68df2 100644 --- a/gcc/config/riscv/thead.md +++ b/gcc/config/riscv/thead.md @@ -458,6 +458,27 @@ (define_insn "*th_mempair_load_zero_extendsidi2" ;; XTheadMemIdx +;; Help reload to add a displacement for the base register. +;; In the case `zext(*(uN*))(base+((rN<<1)&0x1fffe))` LRA splits +;; off two new instructions: a) `new_base = base + disp`, and +;; b) `index = (rN<<1)&0x1fffe`. The index calculation has no +;; corresponding instruction pattern and needs this insn_and_split +;; to recover. + +(define_insn_and_split "*th_memidx_operand" + [(set (match_operand:DI 0 "register_operand" "=r") + (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r") + (match_operand:QI 2 "imm123_operand" "Ds3")) + (match_operand 3 "const_int_operand" "n")))] + "TARGET_64BIT && TARGET_XTHEADMEMIDX && (lra_in_progress || reload_completed) + && (INTVAL (operands[3]) >> INTVAL (operands[2])) == 0x" So this is a nasty little problem and touches on a deeper issue. The core problem is that combine doesn't know anything about constraints. It works with predicates and conditions.So combine has no idea if it's creating an ASM with operands that can't be handled. As a result combine has to be careful with ASMs. In can_combine_p we have: /* Can't merge an ASM_OPERANDS. */ || GET_CODE (src) == ASM_OPERANDS Which is part of a large conditional indicating when we can't combine two insns together. So I would have expected it to reject this insn for combining: (insn 12 11 0 2 (set (mem/f:DI (reg/v/f:DI 138 [ e ]) [2 *e_6+0 S8 A64]) (asm_operands:DI ("") ("=A") 0 [ (mem/f:DI (reg/v/f:DI 138 [ e ]) [2 *e_6+0 S8 A64]) ] [ (asm_input:DI ("A") j.c:8) ] [] j.c:8)) "j.c":8:3 -1 (expr_list:REG_DEAD (reg/v/f:DI 138 [ e ]) (nil))) So the natural question is why did insn 12 participate in combination at all: Trying 11 -> 12: 11: r138:DI=r145:DI+r144:DI REG_DEAD r145:DI REG_DEAD r144:DI 12: [r138:DI]=asm_operands REG_DEAD r138:DI Failed to match this instruction: (set (mem/f:DI (plus:DI (reg/f:DI 145 [ b ]) (reg:DI 144 [ _4 ])) [2 *e_6+0 S8 A64]) (asm_operands:DI ("") ("=A") 0 [ (mem/f:DI (plus:DI (reg/f:DI 145 [ b ]) (reg:DI 144 [ _4 ])) [2 *e_6+0 S8 A64]) ] [ (asm_input:DI ("A") j.c:8) ] [] j.c:8)) allowing combination of insns 11 and 12 original costs 4 + 36 = 40 replacement cost 36 deferring deletion of insn with uid = 11. modifying insn i312: [r145:DI+r144:DI]=asm_operands REG_DEAD r144:DI REG_DEAD r145:DI deferring rescan insn with uid = 12. So without a deep dive, my question is should this have been rejected for combination before we even got here? I just don't see how combine can do anything with asms other than replacing one pseudo with another since combine doesn't have any notion of constraints. Segher, comments? jeff
[PATCH] LoongArch: Make gen-evolution.awk compatible with FreeBSD awk
Avoid using gensub that FreeBSD awk lacks, use gsub and split those each of gawk, mawk, and FreeBSD awk provides. Reported-by: mp...@vip.163.com Link: https://man.freebsd.org/cgi/man.cgi?query=awk gcc/ChangeLog: * config/loongarch/genopts/gen-evolution.awk: Avoid using gensub that FreeBSD awk lacks. --- Manually tested the script with gawk and FreeBSD awk. Ok for trunk? gcc/config/loongarch/genopts/gen-evolution.awk | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/gcc/config/loongarch/genopts/gen-evolution.awk b/gcc/config/loongarch/genopts/gen-evolution.awk index bf16b26760e..142b658fe7a 100644 --- a/gcc/config/loongarch/genopts/gen-evolution.awk +++ b/gcc/config/loongarch/genopts/gen-evolution.awk @@ -33,10 +33,12 @@ BEGIN { { cpucfg_word[NR] = $1 cpucfg_bit_in_word[NR] = $2 -name[NR] = gensub(/-/, "_", "g", $3) +name[NR] = $3 +gsub("-", "_", name[NR]) name_capitalized[NR] = toupper(name[NR]) -isa_version_major[NR] = gensub(/^([1-9][0-9]*)\.([0-9]+)$/, "\\1", 1, $4) -isa_version_minor[NR] = gensub(/^([1-9][0-9]*)\.([0-9]+)$/, "\\2", 1, $4) +split($4, isa_ver, "\\.") +isa_version_major[NR] = isa_ver[1] +isa_version_minor[NR] = isa_ver[2] $1 = $2 = $3 = $4 = "" sub (/^\s*/, "") -- 2.49.0
Re: [PATCH v2] RISC-V: vsetvl: skip abnormal edge on vsetvl insertion [PR119533]
On 4/1/25 20:15, Vineet Gupta wrote: On 3/31/25 23:48, Heinrich Schuchardt wrote: On 3/30/25 01:49, Vineet Gupta wrote: changes since v2 - dump log sanfu --- vsetvl phase4 uses LCM guided info to insert VSETVL insns. It has an additional loop to insert missing vsetvls on certain edges. Currently it asserts/aborts on encountering EDGE_ABNORMAL. When enabling go frontend with V enabled, libgo build hits the assert. It seems to be safe to just skip the abnormal edge. Hello Vineet, Is there a test case where only following an abnormal edge between code blocks would require to call VSETVL? In the sketched code below following the exception would require VSETVL to be called while the edge from the try block to the finally block would not require this. try { for (..) { uint16_t vectorizable math if (condition) throw exception uint16_t vectorizable math } for (..) { uint8_t vectorizable math } } catch exception { } finally for (..) { uint8_t vectorizable math } } Yeah we are going to run testsuite with -fnon-call-exceptions to find such cases. But I'd argue, there is no need to optimize vsetvl for such esoteric cases (vs. code complexity trade-off). After all we'd just endup with an extra VSETVL in the finally block. Thx, -Vineet My worry is not superfluous vsetvl statements but missing ones on abnormal edges. Your patch sounds to me like: Ignore abnormal edges. Never insert vsetvl statements there. We need to test that on an abnormal edge we are inserting a vsetvl instruction when needed. As abnormal edges typically don't connect two vectorized code-blocks with different vsetvl requirements this requires analyzing generated code for this specific scenario. Best regards Heinrich
Re: [PATCH v2] RISC-V: vsetvl: skip abnormal edge on vsetvl insertion [PR119533]
On 4/1/25 10:46 PM, Heinrich Schuchardt wrote: On 4/1/25 20:15, Vineet Gupta wrote: On 3/31/25 23:48, Heinrich Schuchardt wrote: On 3/30/25 01:49, Vineet Gupta wrote: changes since v2 - dump log sanfu --- vsetvl phase4 uses LCM guided info to insert VSETVL insns. It has an additional loop to insert missing vsetvls on certain edges. Currently it asserts/aborts on encountering EDGE_ABNORMAL. When enabling go frontend with V enabled, libgo build hits the assert. It seems to be safe to just skip the abnormal edge. Hello Vineet, Is there a test case where only following an abnormal edge between code blocks would require to call VSETVL? In the sketched code below following the exception would require VSETVL to be called while the edge from the try block to the finally block would not require this. try { for (..) { uint16_t vectorizable math if (condition) throw exception uint16_t vectorizable math } for (..) { uint8_t vectorizable math } } catch exception { } finally for (..) { uint8_t vectorizable math } } Yeah we are going to run testsuite with -fnon-call-exceptions to find such cases. But I'd argue, there is no need to optimize vsetvl for such esoteric cases (vs. code complexity trade-off). After all we'd just endup with an extra VSETVL in the finally block. Thx, -Vineet My worry is not superfluous vsetvl statements but missing ones on abnormal edges. Your patch sounds to me like: Ignore abnormal edges. Never insert vsetvl statements there. We need to test that on an abnormal edge we are inserting a vsetvl instruction when needed. As abnormal edges typically don't connect two vectorized code-blocks with different vsetvl requirements this requires analyzing generated code for this specific scenario. Which is what Vineet, Robin, myself and others discussed at length in this morning's patch review meeting. It's a classic problem with LCM based algorithms. We can't just ignore the edge. What we instead need to do is ensure the algorithm never wants to insert on these problematical edges. This is typically handled by "killing" expressions so that they don't propagate to these problem edges. jeff
Re: [PATCH] PR119482: Avoid mispredictions in bitmap_set_bit
> Am 02.04.2025 um 00:02 schrieb Andi Kleen : > > From: Andi Kleen > > This isn't a regression, but it's a very simple patch with high > performance improvement, so perhaps suitable in the current stage. Ok. Can you add a comment as to why we store unconditionally? Richard > --- > > bitmap_set_bit checks the original value of the bit to return it to the > caller and then only writes the new value back if it changes. > Most callers of bitmap_set_bit don't need the return value, but with the > conditional store > the CPU still has to predict it correctly since gcc doesn't know how to do > that without APX on x86 (even though CMOV could do it with a dummy target). > Really if-conversion should handle this case, but for now we can fix > it. > > This simple patch improves runtime by 15% for the test case in the PR. > Which is more than I expected given it only has ~1.44% of the cycles, but I > guess > the mispredicts caused some down stream effects. > > ../obj-fast/gcc/cc1plus-bitmap -std=gnu++20 -O2 pr119482.cc -quiet ran >1.15 ± 0.01 times faster than ../obj-fast/gcc/cc1plus -std=gnu++20 -O2 > pr119482.cc -quiet > > At least with this test case the total number of branches decreases > drastically. Even though the mispredict rate goes up slightly it is > still a big win. > > $ perf stat -e > branches,branch-misses,uncore_imc/cas_count_read/,uncore_imc/cas_count_write/ > \ > -a ../obj-fast/gcc/cc1plus -std=gnu++20 -O2 pr119482.cc -quiet -w > > Performance counter stats for 'system wide': > >41,932,957,091 branches > 686,117,623 branch-misses#1.64% of all > branches > 43,690.47 MiB uncore_imc/cas_count_read/ > 12,362.56 MiB uncore_imc/cas_count_write/ > > 49.328633365 seconds time elapsed > > $ perf stat -e > branches,branch-misses,uncore_imc/cas_count_read/,uncore_imc/cas_count_write/ > \ > -a ../obj-fast/gcc/cc1plus-bitmap -std=gnu++20 -O2 pr119482.cc -quiet -w > > Performance counter stats for 'system wide': > >37,092,113,179 branches > 663,641,708 branch-misses#1.79% of all > branches > 43,196.52 MiB uncore_imc/cas_count_read/ > 12,369.33 MiB uncore_imc/cas_count_write/ > > 42.632458350 seconds time elapsed > > gcc/ChangeLog: > >PR middle-end/119482 >* bitmap.cc (bitmap_set_bit): Write back value unconditionally > --- > gcc/bitmap.cc | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/gcc/bitmap.cc b/gcc/bitmap.cc > index f5a64b495ab3..7744f8f8c2e4 100644 > --- a/gcc/bitmap.cc > +++ b/gcc/bitmap.cc > @@ -969,8 +969,7 @@ bitmap_set_bit (bitmap head, int bit) > if (ptr != 0) > { > bool res = (ptr->bits[word_num] & bit_val) == 0; > - if (res) > -ptr->bits[word_num] |= bit_val; > + ptr->bits[word_num] |= bit_val; > return res; > } > > -- > 2.49.0 >
Re: [PATCH] Libstdc++: Fix bootstrap failure for cross without tm.tm_zone [PR119550]
On Tue, Apr 1, 2025 at 2:46 PM Jonathan Wakely wrote: > On Tue, 1 Apr 2025 at 11:34, Tomasz Kaminski wrote: > > > > > > > > On Mon, Mar 31, 2025 at 7:28 PM Jonathan Wakely > wrote: > >> > >> In r15-8491-g778c28c70f8573 I added a use of the Autoconf macro > >> AC_STRUCT_TIMEZONE, but that requires a link-test for the global tzname > >> object if tm.tm_zone isn't supported. That link-test isn't allowed for > >> cross-compilation, so bootstrap fails if tm.tm_zone isn't supported. > >> > >> Since libstdc++ only cares about tm.tm_zone and won't use tzname anyway, > >> we don't need the link-test. Replace AC_STRUCT_TIMEZONE with a custom > >> macro that only checks for tm.tm_zone. We can improve on the Autoconf > >> macro by checking it's a suitable type, which isn't actually checked by > >> AC_STRUCT_TIMEZONE. > >> > >> libstdc++-v3/ChangeLog: > >> > >> PR libstdc++/119550 > >> * acinclude.m4 (GLIBCXX_STRUCT_TM_TM_ZONE): New macro. > >> * config.h.in: Regenerate. > >> * configure: Regenerate. > >> * configure.ac: Use GLIBCXX_STRUCT_TM_TM_ZONE. > >> * include/bits/chrono_io.h (__formatter_chrono::_M_c): Check > >> _GLIBCXX_USE_STRUCT_TM_TM_ZONE instead of > >> _GLIBCXX_HAVE_STRUCT_TM_TM_ZONE. > >> --- > >> > >> Testing x86_64-linux and sparc-solaris2.11, looks fine so far. > >> > >> libstdc++-v3/acinclude.m4 | 35 > >> libstdc++-v3/config.h.in | 21 +-- > >> libstdc++-v3/configure| 238 -- > >> libstdc++-v3/configure.ac | 5 +- > >> libstdc++-v3/include/bits/chrono_io.h | 2 +- > >> 5 files changed, 109 insertions(+), 192 deletions(-) > >> > >> diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4 > >> index e668d2dba27..02fd349e11d 100644 > >> --- a/libstdc++-v3/acinclude.m4 > >> +++ b/libstdc++-v3/acinclude.m4 > >> @@ -5744,6 +5744,41 @@ AC_DEFUN([GLIBCXX_ZONEINFO_DIR], [ > >>fi > >> ]) > >> > >> +dnl > >> +dnl Check for a tm_zone member in struct tm. > >> +dnl > >> +dnl This member is defined as const char* in Glibc, newlib, > POSIX.1-2024, > >> +dnl and as char* in BSD (including macOS). > >> +dnl > >> +dnl Defines: > >> +dnl _GLIBCXX_USE_STRUCT_TM_TM_ZONE if struct tm has a tm_zone member. > >> +dnl > >> +AC_DEFUN([GLIBCXX_STRUCT_TM_TM_ZONE], [ > >> + > >> + AC_LANG_SAVE > > > > From documentation this is deprecated in favor of AC_LANG_PUSH. > > It looks like that is supported by autoconf 2.69 and is already used > elsewhere in GCC. We don't currently use it in libstdc++ but it should > be safe to do so. > > >> + AC_LANG_CPLUSPLUS > >> + ac_save_CXXFLAGS="$CXXFLAGS" > >> + CXXFLAGS="$CXXFLAGS -std=c++20" > > > > The program that is compiled does not seem to require C++20. > > If we change the declaration of "t" to use "= {}", we could do it in > C++98. > > Any reason to adjust the flags at all? > > We only need to use the tm_zone member in C++20 mode, and it's > possible that on some platform it's not exposed for older standards > (very unlikely, but possible). I wanted to test for exactly what we > require. I don't feel strongly about it though. > I would found it surprising if the layout of the structure would fiffer that much, wouldn't this cause ABI compatibility problems for TU that memcopies struct tm and are compiled with different language versions? I have slight preference towards not overriding CXXFLAGS, as for me this suggest that struct tm was changed in C++20 (or linked C standard), which is not the case. But I am fine either way. > > >> > >> + > >> + AC_CACHE_CHECK([for tm_zone member of struct tm], > glibcxx_cv_tm_zone, [ > >> +AC_COMPILE_IFELSE([AC_LANG_PROGRAM([#include > >> + ], > >> + [struct tm t{}; t.tm_zone = (char*)0;] > >> + )], > >> + [glibcxx_cv_tm_zone=yes], > >> + [glibcxx_cv_tm_zone=no] > >> + ) > >> +]) > >> + > >> + if test $glibcxx_cv_tm_zone = yes; then > >> +AC_DEFINE(_GLIBCXX_USE_STRUCT_TM_TM_ZONE, 1, > >> + [Define if struct tm has a tm_zone member.]) > >> + fi > >> + > >> + CXXFLAGS="$ac_save_CXXFLAGS" > >> + AC_LANG_RESTORE > >> +]) > >> + > >> dnl > >> dnl Check whether lock tables can be aligned to avoid false sharing. > >> dnl > >> diff --git a/libstdc++-v3/config.h.in b/libstdc++-v3/config.h.in > >> index be151f43dd6..77bbaf1beaa 100644 > >> --- a/libstdc++-v3/config.h.in > >> +++ b/libstdc++-v3/config.h.in > >> @@ -74,10 +74,6 @@ > >> don't. */ > >> #undef HAVE_DECL_STRNLEN > >> > >> -/* Define to 1 if you have the declaration of `tzname', and to 0 if > you don't. > >> - */ > >> -#undef HAVE_DECL_TZNAME > >> - > >> /* Define to 1 if you have the header file. */ > >> #undef HAVE_DIRENT_H > >> > >> @@ -412,9 +408,6 @@ > >> /* Define to 1 if `d_type' is a member of `struct dirent'. */ > >> #undef HAVE_STRUCT_DIRENT_D_TYPE > >> > >> -/* Define to 1 if `tm_zone' is a member of `struct tm'. */ > >> -#undef HAVE_STR
Re: [PATCH] target/119549 - fixup handling of -mno-sse4
On Tue, Apr 1, 2025 at 3:56 PM Jakub Jelinek wrote: > > On Tue, Apr 01, 2025 at 01:36:23PM +0800, Hongtao Liu wrote: > > >Changing ix86_valid_target_attribute_inner_p might be even better because > > >OPT_msse4 is RejectNegative option, so !value for it looks weird. > > msse4 is defined as ix86_opt_isa in ix86_valid_target_attribute_inner_p > > > > 1055IX86_ATTR_ISA ("sse4", OPT_msse4), > > > > and would be handled in ix86_handle_option > > > > 1282 else if (type == ix86_opt_isa) > > 1283{ > > 1284 struct cl_decoded_option decoded; > > 1285 > > 1286 generate_option (opt, NULL, opt_set_p, CL_TARGET, > > &decoded); > > 1287 ix86_handle_option (opts, opts_set, > > 1288 &decoded, input_location); > > 1289} > > > > So I think it's already correct here. > > The entries are > msse4 > Target RejectNegative Mask(ISA_SSE4_2) Var(ix86_isa_flags) Save > Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 built-in functions and > code generation. > > mno-sse4 > Target RejectNegative InverseMask(ISA_SSE4_1) Var(ix86_isa_flags) Save > Do not support SSE4.1 and SSE4.2 built-in functions and code generation. > > so if it is turned into something without RejectNegative > it will be wrong in the Mask argument, either for the positive or for the > negative case. For the command line, or target attribute, the actual operation goes into ix86_handle_option, and as long as we get it right in this ix86_handle_option, everything else should be fine. As for the macros generated by the mask name (TARGET_SSE4_1_P), their meanings remain the same, so in my opinion they can be handled this way. But to be on the safe side, I agree that it should be adjusted in ix86_valid_target_attribute_inner_p first. In GCC16, we can do a refactor for this. > > I wonder though if msse4 should not be a Target RejectNegative Alias to > msse4.2 > and mno-sse4 RejectNegative Alias to mno-sse4.1. > Though, unsure if one will still be able to specify it in #pragma GCC target > or target attribute... > > Jakub > -- BR, Hongtao
Re: [PATCH] tailc: Improve tail recursion handling [PR119493]
On Tue, 1 Apr 2025, Jakub Jelinek wrote: > Hi! > > This is a partial step towards fixing that PR. > For musttail recursive calls which have non-is_gimple_reg_type typed > parameters, the only case we've handled was if the exact parameter > was passed through (perhaps modified, but still the same PARM_DECL). > That isn't necessary, we can copy the argument to the parameter as well > (just need to watch for the use of the parameter in later arguments, > say musttail recursive call which swaps 2 structure arguments). > > The patch attempts to play safe and punts if any of the parameters are > addressable (like we do for all normal tail calls and tail recursions, > except for musttail in the posted unreviewed patch). > > With this patch (at least when early inlining isn't done on not yet > optimized body) inlining should see already tail recursion optimized > body and will not have problems with SRA breaking musttail. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? This looks OK, but I wonder if ... > 2025-04-01 Jakub Jelinek > > PR tree-optimization/119493 > * tree-tailcall.cc (find_tail_calls): Don't punt on tail recusion > if some arguments don't have is_gimple_reg_type, only punt if they > have non-POD types, or volatile, or addressable. Set > tailr_arg_needs_copy in those cases too. > (eliminate_tail_call): Copy call arguments to params if they don't > have is_gimple_reg_type, use temporaries if the argument is used > later. > (tree_optimize_tail_calls_1): Skip !is_gimple_reg_type > tailr_arg_needs_copy parameters. Formatting fix. > > * gcc.dg/pr119493-1.c: New test. > > --- gcc/tree-tailcall.cc.jj 2025-03-28 10:49:30.0 +0100 > +++ gcc/tree-tailcall.cc 2025-03-31 12:31:01.717603446 +0200 > @@ -672,19 +672,16 @@ find_tail_calls (basic_block bb, struct >have a copyable type and the two arguments must have reasonably >equivalent types. The latter requirement could be relaxed if >we emitted a suitable type conversion statement. */ > - if (!is_gimple_reg_type (TREE_TYPE (param)) > + if (TREE_ADDRESSABLE (TREE_TYPE (param)) > || !useless_type_conversion_p (TREE_TYPE (param), >TREE_TYPE (arg))) > break; > > - /* The parameter should be a real operand, so that phi node > - created for it at the start of the function has the meaning > - of copying the value. This test implies is_gimple_reg_type > - from the previous condition, however this one could be > - relaxed by being more careful with copying the new value > - of the parameter (emitting appropriate GIMPLE_ASSIGN and > - updating the virtual operands). */ > - if (!is_gimple_reg (param)) > + if (is_gimple_reg_type (TREE_TYPE (param)) > + ? !is_gimple_reg (param) ... we want to restrict this to musttail calls at this point and relax for stage1 only? > + : (!is_gimple_variable (param) > + || TREE_THIS_VOLATILE (param) > + || may_be_aliased (param))) > break; > } > } > @@ -934,9 +931,9 @@ find_tail_calls (basic_block bb, struct > param = DECL_CHAIN (param), idx++) > { > tree ddef, arg = gimple_call_arg (call, idx); > - if (is_gimple_reg (param) > - && (ddef = ssa_default_def (cfun, param)) > - && (arg != ddef)) > + if (!is_gimple_reg (param) > + || ((ddef = ssa_default_def (cfun, param)) > + && arg != ddef)) > bitmap_set_bit (tailr_arg_needs_copy, idx); > } > } > @@ -1212,6 +1209,7 @@ eliminate_tail_call (struct tailcall *t, > >/* Add phi node entries for arguments. The ordering of the phi nodes > should > be the same as the ordering of the arguments. */ > + auto_vec copies; >for (param = DECL_ARGUMENTS (current_function_decl), >idx = 0, gpi = gsi_start_phis (first); > param; > @@ -1220,6 +1218,35 @@ eliminate_tail_call (struct tailcall *t, >if (!bitmap_bit_p (tailr_arg_needs_copy, idx)) > continue; > > + if (!is_gimple_reg_type (TREE_TYPE (param))) > + { > + if (param == gimple_call_arg (stmt, idx)) > + continue; > + /* First check if param isn't used by any of the following > + call arguments. If it is, we need to copy first to > + a temporary and only after doing all the assignments copy it > + to param. */ > + size_t idx2 = idx + 1; > + tree param2 = DECL_CHAIN (param); > + for (; param2; param2 = DECL_CHAIN (param2), idx2++) > + if (!is_gimple_reg_type (TREE_TYPE (param))) > + { > + tree base = get_base_address (gimple_call_arg (stmt, idx2
Re: [PATCH] tailr: Punt on tail recursions that would break musttail [PR119493]
On Tue, 1 Apr 2025, Jakub Jelinek wrote: > Hi! > > While working on the previous tailc patch, I've noticed the following > problem. > The testcase below fails, because we decide to tail recursion optimize > the call, but tail recursion (as documented in tree-tailcall.cc) needs to > add some result multiplication and/or addition if any tail recursion uses > accumulator, which is added right before the return. > So, if there are musttail non-recurive calls in the function, successful > tail recursion optimization will mean we'll later error on the musttail > calls. musttail recursive calls are ok, those would be tail recursion > optimized. > > So, the following patch punts on all tail recursion optimizations if it > needs accumulators (add and/or mult) if there is at least one non-recursive > musttail call. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. Do we update cfun->has_musttail at some point for optimized away musttail calls with respect to inlining/DCE? I do wonder whether tree-tailcall.cc itself could "quickly" determine whether we have any musttail calls that we can handle and will not diagnose? Thanks, Richard. > 2025-04-01 Jakub Jelinek > > PR tree-optimization/119493 > * tree-tailcall.cc (tree_optimize_tail_calls_1): Ignore tail recursion > candidates which need accumulators if there is at least one musttail > non-recursive call. > > * gcc.dg/pr119493-2.c: New test. > > --- gcc/tree-tailcall.cc.jj 2025-03-31 13:00:58.288725496 +0200 > +++ gcc/tree-tailcall.cc 2025-03-31 14:34:31.285106333 +0200 > @@ -1373,6 +1373,38 @@ tree_optimize_tail_calls_1 (bool opt_tai >live_vars = NULL; > } > > + if (cfun->has_musttail) > +{ > + /* We can't mix non-recursive must tail calls with tail recursive > + calls which require accumulators, because in that case we have to > + emit code in between the musttail calls and return, which prevent > + calling them as tail calls. So, in that case give up on the > + tail recursion. */ > + for (act = tailcalls; act; act = act->next) > + if (!act->tail_recursion) > + { > + gcall *call = as_a (gsi_stmt (act->call_gsi)); > + if (gimple_call_must_tail_p (call)) > + break; > + } > + if (act) > + for (struct tailcall **p = &tailcalls; *p; ) > + { > + if ((*p)->tail_recursion && ((*p)->add || (*p)->mult)) > + { > + struct tailcall *a = *p; > + *p = (*p)->next; > + gcall *call = as_a (gsi_stmt (a->call_gsi)); > + maybe_error_musttail (call, > + _("tail recursion with accumulation " > + "mixed with musttail " > + "non-recursive call"), diag_musttail); > + free (a); > + } > + else > + p = &(*p)->next; > + } > +} >/* Construct the phi nodes and accumulators if necessary. */ >a_acc = m_acc = NULL_TREE; >for (act = tailcalls; act; act = act->next) > --- gcc/testsuite/gcc.dg/pr119493-2.c.jj 2025-03-31 14:41:18.977464154 > +0200 > +++ gcc/testsuite/gcc.dg/pr119493-2.c 2025-03-31 14:45:24.327068682 +0200 > @@ -0,0 +1,22 @@ > +/* PR tree-optimization/119493 */ > +/* { dg-do compile { target musttail } } */ > +/* { dg-options "-O2 -fdump-tree-tailr1-details" } */ > +/* { dg-final { scan-tree-dump-times "tail recursion with accumulation mixed > with musttail non-recursive call" 2 "tailr1" } } */ > + > +[[gnu::noipa]] int > +bar (int x, int y) > +{ > + return x + y; > +} > + > +[[gnu::noinline, gnu::noclone]] int > +foo (int x, int y) > +{ > + if (x < 10) > +[[gnu::musttail]] return bar (x, y); > + if (y & 2) > +return foo (x - 1, y) * 2; > + if (y & 1) > +[[gnu::musttail]] return foo (x - 1, y); > + return foo (x - 1, y) * 3; > +} > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Re: [PATCH] sra: Avoid creating TBAA hazards (PR118924)
On Mon, 31 Mar 2025, Martin Jambor wrote: > Hi, > > the testcase in PR 118924, when compiled on Aarch64, contains an > gimple aggregate assignment statement in between different types which > are types_compatible_p but behave differently for the purposes of > alias analysis. > > SRA replaces the statement with a series of scalar assignments which > however have LHSs access chains modeled on the RHS type and so do not > alias with a subsequent reads and so are DSEd. > > SRA clearly gets its "same_access_path" logic subtly wrong. One issue > is that the same_access_path_p function probably should be implemented > more along the lines of (parts of ao_compare::compare_ao_refs) instead > of internally relying on operand_equal_p. That is however not the > problem in the PR and so I will deal with it only later. > > The issue here is that even when the access path is the same, it must > not be bolted on an aggregate type that does not match. This patch > does that, taking just one simple function from the > ao_compare::compare_ao_refs machinery and using it to detect the > situation. The rest is just merging the information in between > accesses of the same access group. > > I looked at how many times we come across such assignment during > "make stage2-bubble" of GCC (configured with only c and C++ and > without multilib and libsanitizers) and on an x86_64 there were 87924 > such assignments (though now I realize not all of them had to be > aggregate), so they do happen. The patch leads to about 5% increase > of cases where we don't use an "access path" but resort to a > MEM_REF (from 90209 to 95204). On an Aarch64, there were 92268 such > assignments and the increase of falling back to MEM_REFs was by > 4% (but from a bigger base 132983 to 107991). > > Bootstrapped on x86_64-linux and Aarch64-linux. OK for master and then > for all active release branches? I'm unsure about the lto_streaming_safe arg, in fact the implementation is if (lto_streaming_safe) return type1 == type2; else return TYPE_CANONICAL (type1) == TYPE_CANONICAL (type2); but I think we guarantee (we _need_ to guarantee!) that when TYPE_CANONICAL (type1) == TYPE_CANONICAL (type2), after LTO streaming it's still TYPE_CANONICAL (type1) == TYPE_CANONICAL (type2). Otherwise assignments previously valid GIMPLE might no longer be valid and things that aliased previously might no longer alias. But that's an implementation bug in types_equal_for_same_type_for_tbaa_p, but I think you should be able to pass in false as if the implementation were fixed. IMO if necessary the implementation itself should use sth like if (flag_lto && !in_lto_p) rather than leaving it up to the caller ... the only existing caller uses lto_streaming_expected_p () as arg, which is similar to my proposal. I'd say you want to export at a forwarder bool types_equal_for_same_type_for_tbaa_p (tree type1, tree type2) { return types_equal_for_same_type_for_tbaa_p (type1, type2, lto_streaming_expected_p ()); } instead as it should be an internal detail. OK with that change. Can you fixup the comment /* Return ture if TYPE1 and TYPE2 will always give the same answer when compared wit hother types using same_type_for_tbaa_p. */ when you are there? The function is called same_type_for_tbaa and 'wit hother' should be 'with other' Richard. > Thanks, > > Martin > > > gcc/ChangeLog: > > 2025-03-24 Martin Jambor > > PR tree-optimization/118924 > * tree-ssa-alias-compare.h (types_equal_for_same_type_for_tbaa_p): > Declare. > * tree-ssa-alias.cc (types_equal_for_same_type_for_tbaa_p): Make > public. > * tree-sra.cc: Include tree-ssa-alias-compare.h. > (create_access): Initialzie grp_same_access_path to true. > (build_accesses_from_assign): Detect tbaa hazards and clear > grp_same_access_path fields of involved accesses when they occur. > (sort_and_splice_var_accesses): Take previous values of > grp_same_access_path into account. > > gcc/testsuite/ChangeLog: > > 2025-03-25 Martin Jambor > > PR tree-optimization/118924 > * g++.dg/tree-ssa/pr118924.C: New test. > --- > gcc/testsuite/g++.dg/tree-ssa/pr118924.C | 29 > gcc/tree-sra.cc | 18 --- > gcc/tree-ssa-alias-compare.h | 3 +++ > gcc/tree-ssa-alias.cc| 2 +- > 4 files changed, 48 insertions(+), 4 deletions(-) > create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr118924.C > > diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr118924.C > b/gcc/testsuite/g++.dg/tree-ssa/pr118924.C > new file mode 100644 > index 000..c95eacafc9c > --- /dev/null > +++ b/gcc/testsuite/g++.dg/tree-ssa/pr118924.C > @@ -0,0 +1,29 @@ > +/* { dg-do run } */ > +/* { dg-options "-std=c++17 -O2" } */ > + > +template struct Vector { > + int m_data[Size]; > + Vector(int, int, int) {} > +}; > +enum class E { POINTS, LINES, TRIANGLES }; >
Re: [PATCH] c++/modules: Forbid exposures of TU-local entities in inline variables [PR119551]
On 4/1/25 7:02 AM, Nathaniel Shead wrote: Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? -- >8 -- An inline variable has vague linkage, and needs to be conditionally emitted in TUs that reference it. Unfortunately this clashes with [basic.link] p14.2, which says that we ignore the initialisers of all variables (including inline ones), since importers will not have access to the referenced TU-local entities to write the definition. This patch makes such exposures be ill-formed. One case that continues to work is if the exposure is part of the dynamic initialiser of an inline variable; in such cases, the definition has been built as part of the module interface unit anyway, and importers don't need to write it out again, so such exposures are "harmless". OK. PR c++/119551 gcc/cp/ChangeLog: * module.cc (trees_out::write_var_def): Only ignore non-inline variable initializers. gcc/testsuite/ChangeLog: * g++.dg/modules/internal-5_a.C: Add cases that should be ignored. * g++.dg/modules/internal-5_b.C: Test these new cases, and make the testcase more robust. * g++.dg/modules/internal-11.C: New test. * g++.dg/modules/internal-12_a.C: New test. * g++.dg/modules/internal-12_b.C: New test. Signed-off-by: Nathaniel Shead --- gcc/cp/module.cc | 7 +++--- gcc/testsuite/g++.dg/modules/internal-11.C | 24 gcc/testsuite/g++.dg/modules/internal-12_a.C | 13 +++ gcc/testsuite/g++.dg/modules/internal-12_b.C | 14 gcc/testsuite/g++.dg/modules/internal-5_a.C | 8 ++- gcc/testsuite/g++.dg/modules/internal-5_b.C | 6 + 6 files changed, 68 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/g++.dg/modules/internal-11.C create mode 100644 gcc/testsuite/g++.dg/modules/internal-12_a.C create mode 100644 gcc/testsuite/g++.dg/modules/internal-12_b.C diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index 894c70f7225..ce22b2ece3f 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -12679,9 +12679,10 @@ trees_in::read_function_def (tree decl, tree maybe_template) void trees_out::write_var_def (tree decl) { - /* The initializer of a variable or variable template is ignored for - determining exposures. */ - auto ovr = make_temp_override (dep_hash->ignore_tu_local, VAR_P (decl)); + /* The initializer of a non-inline variable or variable template is + ignored for determining exposures. */ + auto ovr = make_temp_override (dep_hash->ignore_tu_local, +VAR_P (decl) && !DECL_INLINE_VAR_P (decl)); tree init = DECL_INITIAL (decl); tree_node (init); diff --git a/gcc/testsuite/g++.dg/modules/internal-11.C b/gcc/testsuite/g++.dg/modules/internal-11.C new file mode 100644 index 000..53eb30a62be --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/internal-11.C @@ -0,0 +1,24 @@ +// PR c++/119551 +// { dg-additional-options "-fmodules" } +// { dg-module-cmi !M } + +export module M; + +static int tu_local = 5; +static int* foo() { return &tu_local; } + +// For implementation reasons, we adjust [basic.link] p14.2 to restrict ignored +// exposures to non-inline variables, since for inline variables without +// dynamic initialisation we need to emit their initialiser for importer use. + +int* a = &tu_local; // OK +inline int* b = &tu_local; // { dg-error "exposes TU-local entity" } + +// But dynamic initialisers are fine, importers will just treat them as external. +inline int* c = foo(); // OK + +// For consistency, we follow the same rules with templates, noting that +// we still need to emit definitions with dynamic initializers so we error. +template int* d = &tu_local; // OK +template inline int* e = &tu_local; // { dg-error "exposes TU-local entity" } +template inline int* f = foo(); // { dg-error "exposes TU-local entity" } diff --git a/gcc/testsuite/g++.dg/modules/internal-12_a.C b/gcc/testsuite/g++.dg/modules/internal-12_a.C new file mode 100644 index 000..5c4e7c602b0 --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/internal-12_a.C @@ -0,0 +1,13 @@ +// PR c++/119551 +// { dg-additional-options "-fmodules" } +// { dg-module-cmi M } +// Test that emitting variables referencing TU-local entities +// builds and runs correctly. + +export module M; + +static int tu_local_var = 5; +static int* tu_local_func() { return &tu_local_var; } + +export int* a = &tu_local_var; +export inline int* b = tu_local_func(); diff --git a/gcc/testsuite/g++.dg/modules/internal-12_b.C b/gcc/testsuite/g++.dg/modules/internal-12_b.C new file mode 100644 index 000..bc3edf99a11 --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/internal-12_b.C @@ -0,0 +1,14 @@ +// PR c++/119551 +// { dg-module-do run } +// { dg-additional-options "-fmodules" } + +import M; + +int main() { + if (*a != 5) +__builtin_abort(); + if (*b != 5) +__builtin_abort(); + if (a
Re: [PATCH] target/119549 - fixup handling of -mno-sse4
On Tue, 1 Apr 2025, Jakub Jelinek wrote: > On Tue, Apr 01, 2025 at 01:36:23PM +0800, Hongtao Liu wrote: > > >Changing ix86_valid_target_attribute_inner_p might be even better because > > >OPT_msse4 is RejectNegative option, so !value for it looks weird. > > msse4 is defined as ix86_opt_isa in ix86_valid_target_attribute_inner_p > > > > 1055IX86_ATTR_ISA ("sse4", OPT_msse4), > > > > and would be handled in ix86_handle_option > > > > 1282 else if (type == ix86_opt_isa) > > 1283{ > > 1284 struct cl_decoded_option decoded; > > 1285 > > 1286 generate_option (opt, NULL, opt_set_p, CL_TARGET, > > &decoded); > > 1287 ix86_handle_option (opts, opts_set, > > 1288 &decoded, input_location); > > 1289} > > > > So I think it's already correct here. > > The entries are > msse4 > Target RejectNegative Mask(ISA_SSE4_2) Var(ix86_isa_flags) Save > Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 built-in functions and > code generation. > > mno-sse4 > Target RejectNegative InverseMask(ISA_SSE4_1) Var(ix86_isa_flags) Save > Do not support SSE4.1 and SSE4.2 built-in functions and code generation. > > so if it is turned into something without RejectNegative, > it will be wrong in the Mask argument, either for the positive or for the > negative case. > > I wonder though if msse4 should not be a Target RejectNegative Alias to > msse4.2 > and mno-sse4 RejectNegative Alias to mno-sse4.1. > Though, unsure if one will still be able to specify it in #pragma GCC target > or target attribute... Yeah, I think the lack of Alias(-msse4.1,-msse4.2) aka alias to enable/disable two options is what makes it difficult. I'll post a v2 momentarily fixing up ix86_valid_target_attribute_inner_p instead. That at least seems quite a safe change (also for branches). Richard.
[PATCH] simplify-rtx: Fix shortcut for vector eq/ne
This patch forestalls a regression in gcc.dg/rtl/x86_64/vector_eq.c with the patch for PR116398. The test wants: (cinsn 3 (set (reg:V4SI <0>) (const_vector:V4SI [(const_int 0) (const_int 0) (const_int 0) (const_int 0)]))) (cinsn 5 (set (reg:V4SI <2>) (eq:V4SI (reg:V4SI <0>) (reg:V4SI <1> to be folded to a vector of -1s. One unusual thing about the fold is that the <1> in the second insn is uninitialised; it looks like it should be replaced by <0>, or that there should be an insn 4 that copies <0> to <1>. As it stands, the test relies on init-regs to insert a zero initialisation of <1>. This happens after all the cse/pre/fwprop stuff, with only dce passes between init-regs and combine. Combine therefore sees: (insn 3 2 8 2 (set (reg:V4SI 98) (const_vector:V4SI [ (const_int 0 [0]) repeated x4 ])) 2403 {movv4si_internal} (nil)) (insn 8 3 9 2 (clobber (reg:V4SI 99)) -1 (nil)) (insn 9 8 5 2 (set (reg:V4SI 99) (const_vector:V4SI [ (const_int 0 [0]) repeated x4 ])) -1 (nil)) (insn 5 9 7 2 (set (reg:V4SI 100) (eq:V4SI (reg:V4SI 98) (reg:V4SI 99))) 7874 {*sse2_eqv4si3} (expr_list:REG_DEAD (reg:V4SI 99) (expr_list:REG_DEAD (reg:V4SI 98) (expr_list:REG_EQUAL (eq:V4SI (const_vector:V4SI [ (const_int 0 [0]) repeated x4 ]) (reg:V4SI 99)) (nil) It looks like the test should then pass through a 3, 9 -> 5 combination, so that we get an (eq ...) between two zeros and fold it to a vector of -1s. But although the combination is attempted, the fold doesn't happen. Instead, combine is left to match the unsimplified (eq ...) between two zeros, which rightly fails. The test only passes because late_combine2 happens to try simplifying an (eq ...) between reg X and reg X, which does fold to a vector of -1s. The different handling of registers and constants is due to this code in simplify_const_relational_operation: if (INTEGRAL_MODE_P (mode) && trueop1 != const0_rtx && (code == EQ || code == NE) && ! ((REG_P (op0) || CONST_INT_P (trueop0)) && (REG_P (op1) || CONST_INT_P (trueop1))) && (tem = simplify_binary_operation (MINUS, mode, op0, op1)) != 0 /* We cannot do this if tem is a nonzero address. */ && ! nonzero_address_p (tem)) return simplify_const_relational_operation (signed_condition (code), mode, tem, const0_rtx); INTEGRAL_MODE_P matches vector integer modes, but everything else about the condition is written for scalar integers only. Thus if trueop0 and trueop1 are equal vector constants, we'll bypass all the exclusions and try simplifying a subtraction. This will succeed, giving a vector of zeros. The recursive call will then try to simplify a comparison between the vector of zeros and const0_rtx, which isn't well-formed. Luckily or unluckily, the ill-formedness doesn't trigger an ICE, but it does prevent any simplification from happening. The least-effort fix would be to replace INTEGRAL_MODE_P with SCALAR_INT_MODE_P. But the fold does make conceptual sense for vectors too, so it seemed better to keep the INTEGRAL_MODE_P and generalise the rest of the condition to match. Tested on aarch64-linux-gnu & x86_64-linux-gnu. OK to install? I'm hoping to post the actual patch for PR116398 later today. Richard gcc/ * simplify-rtx.cc (simplify_const_relational_operation): Generalize the constant checks in the fold-via-minus path to match the INTEGRAL_MODE_P condition. --- gcc/simplify-rtx.cc | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index fe007bc7d96..6f969effdf9 100644 --- a/gcc/simplify-rtx.cc +++ b/gcc/simplify-rtx.cc @@ -6657,15 +6657,20 @@ simplify_const_relational_operation (enum rtx_code code, we do not know the signedness of the operation on either the left or the right hand side of the comparison. */ - if (INTEGRAL_MODE_P (mode) && trueop1 != const0_rtx + if (INTEGRAL_MODE_P (mode) + && trueop1 != CONST0_RTX (mode) && (code == EQ || code == NE) - && ! ((REG_P (op0) || CONST_INT_P (trueop0)) - && (REG_P (op1) || CONST_INT_P (trueop1))) + && ! ((REG_P (op0) +|| CONST_SCALAR_INT_P (trueop0) +|| CONST_VECTOR_P (trueop0)) + && (REG_P (op1) + || CONST_SCALAR_INT_P (trueop1) + || CONST_VECTOR_P (trueop1))) && (tem = simplify_binary_operation (MINUS, mode, op0, op1)) != 0 /* We cannot do this if tem is a nonzero address. */ && ! nonzero_address_p (tem)) return simplify_const_relational_operation (signed_condition (code), - mode, tem, const0_rtx); +
Re: [PATCH][gcc-14] libstdc++: Fix bogus -Wstringop-overflow in std::vector::insert [PR117983]
On Tue, Apr 1, 2025 at 11:32 AM Jonathan Wakely wrote: > This was fixed on trunk by r15-4473-g3abe751ea86e34, but that isn't > suitable for backporting. Instead, just add another unreachable > condition in std::vector::_M_range_insert so the compiler knows this > memcpy doesn't use a length originating from a negative ptrdiff_t > converted to a very positive size_t. > > libstdc++-v3/ChangeLog: > > PR libstdc++/117983 > * include/bits/vector.tcc (vector::_M_range_insert): Add > unreachable condition to tell the compiler begin() <= end(). > * testsuite/23_containers/vector/modifiers/insert/117983.cc: New > test. > > (cherry picked from commit 878812b6f6905774ab37cb78903e3e11bf1c508c) > --- > > Tested x86_64-linux. > LGTM. > > This warning is a regression on 12/13/14 so this is needed on all > release branches. > > libstdc++-v3/include/bits/vector.tcc| 2 ++ > .../vector/modifiers/insert/117983.cc | 17 + > 2 files changed, 19 insertions(+) > create mode 100644 > libstdc++-v3/testsuite/23_containers/vector/modifiers/insert/117983.cc > > diff --git a/libstdc++-v3/include/bits/vector.tcc > b/libstdc++-v3/include/bits/vector.tcc > index 0458d560075..b7f472ae09c 100644 > --- a/libstdc++-v3/include/bits/vector.tcc > +++ b/libstdc++-v3/include/bits/vector.tcc > @@ -1002,6 +1002,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER > // reachable. > pointer __old_start = this->_M_impl._M_start; > pointer __old_finish = this->_M_impl._M_finish; > + if ((__old_finish - __old_start) < 0) > + __builtin_unreachable(); > > const size_type __len = > _M_check_len(__n, "vector::_M_range_insert"); > diff --git > a/libstdc++-v3/testsuite/23_containers/vector/modifiers/insert/117983.cc > b/libstdc++-v3/testsuite/23_containers/vector/modifiers/insert/117983.cc > new file mode 100644 > index 000..e6027a677ee > --- /dev/null > +++ > b/libstdc++-v3/testsuite/23_containers/vector/modifiers/insert/117983.cc > @@ -0,0 +1,17 @@ > +// { dg-options "-O3 -Werror=stringop-overflow" } > +// { dg-do compile } > + > +// PR libstdc++/117983 > +// -Wstringop-overflow false positive for __builtin_memmove from > vector::insert > + > +#include > + > +typedef std::vector bytes; > + > +void push(bytes chunk, bytes& data) { > + if (data.empty()) { > +data.swap(chunk); > + } else { > +data.insert(data.end(), chunk.begin(), chunk.end()); > + } > +} > -- > 2.49.0 > >
Re: [PATCH] combine: Use reg_used_between_p rather than modified_between_p in two spots [PR119291]
On Tue, Apr 01, 2025 at 11:27:25AM +0200, Richard Biener wrote: > > Looking at this some more today, I think we should special case > > set_noop_p because that can be put into i2 (except for the JUMP_P > > violations), currently both modified_between_p (pc_rtx, i2, i3) > > and reg_used_between_p (pc_rtx, i2, i3) returns false. > > I'll post a patch incrementally for that (but that feels like > > new optimization, so probably not something that should be backported). > > Trying to review this I noticed that both the comment in combine suggests > that memory set is important and modified_between_p handles MEM_P > 'x' and for non-REG 'x' possibly MEM operands (CONCAT?) while > reg_used_between_p only handles REG_P 'reg'. Also shouldn't > you use reg_referenced_p? At least that function seems to handle > SET_SRC/SET_DEST separately? > > Can we constrain SET_DEST (set1/set0) to a REG_P in combine? Why > does the comment talk about memory? I was worried about making too risky changes this late in stage4 (and especially also for backports). Most of this code is 1992-ish. I think many of the functions are just misnamed, the reg_ in there doesn't match what those functions do (bet they initially supported just REGs and later on support for other kinds of expressions was added, but haven't done git archeology to prove that). What we know for sure is: && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != ZERO_EXTRACT && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != STRICT_LOW_PART && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != ZERO_EXTRACT && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != STRICT_LOW_PART that is checked earlier in the condition. Then it calls && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)), XVECEXP (newpat, 0, 0)) && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)), XVECEXP (newpat, 0, 1)) While it has reg_* in it, that function mostly calls reg_overlap_mentioned_p which is also misnamed, that function handles just fine all of REG, MEM, SUBREG of REG, (SUBREG of MEM not, see below), ZERO_EXTRACT, STRICT_LOW_PART, PC and even some further cases. So, IMHO SET_DEST (set0) or SET_DEST (set0) can be certainly a REG, SUBREG of REG, PC (at least the REG and PC cases are triggered on the testcase) and quite possibly also MEM (SUBREG of MEM not, see below). Now, the code uses !modified_between_p (SET_SRC (set{1,0}), i2, i3) where that function for constants just returns false, for PC returns true, for REG returns reg_set_between_p, for MEM recurses on the address, for MEM_READONLY_P otherwise returns false, otherwise checks using alias.cc code whether the memory could have been modified in between, for all other rtxes recurses on the subrtxes. This part didn't change in my patch. I've only changed those - && !modified_between_p (SET_DEST (set{1,0}), i2, i3) + && !reg_used_between_p (SET_DEST (set{1,0}), i2, i3) where the former has been described above and clearly handles all of REG, SUBREG of REG, PC, MEM and SUBREG of MEM among other things. The replacement reg_used_between_p calls reg_overlap_mentioned_p on each instruction in between i2 and i3. So, there is clearly a difference in behavior if SET_DEST (set{1,0}) is pc_rtx, in that case modified_between_p returns unconditionally true even if there are no instructions in between, but reg_used_between_p if there are no non-debug insns in between returns false. Sorry for missing that, guess I should check for that (with the exception of the noop moves which are often (set (pc) (pc)) and handled by the incremental patch). In fact not just that, reg_used_between_p will only return true for PC if it is mentioned anywhere in the insns in between. Anyway, except for that, for REG it calls refers_to_regno_p and so should find any occurrences of any of the REG or parts of it for hard registers, for MEM returns true if it sees any MEMs in insns in between (conservatively), for SUBREGs apparently it relies on it being SUBREG of REG (so doesn't handle SUBREG of MEM) and handles SUBREG of REG like the SUBREG_REG, PC I've already described. Now, because reg_overlap_mentioned_p doesn't handle SUBREG of MEM, I think already the initial && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)), XVECEXP (newpat, 0, 0)) && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)), XVECEXP (newpat, 0, 1)) calls would have failed --enable-checking=rtl or would have misbehaved, so I think there is no need to check for it further. To your question why I don't use reg_referenced_p, that is because reg_referenced_p is something to call on one insn pattern, while reg_used_between_p is pretty much that on all insns in between two instructions (excluding the boundaries). So, I think it would be safer to add && SET_DEST
Re: [PATCH v4 0/3] [AArch64, OpenMP] Support SVE types with various OpenMP clauses and constructs
Ping. Thanks, Tejas. On 3/18/25 11:27 AM, Tejas Belagod wrote: This series is based on a previous thread and review comments from RichardS and Jakub upstream: https://gcc.gnu.org/pipermail/gcc-patches/2025-March/677072.html The changes suggested are cosmetic in nature. As suggested in the previous series v3, this series doesn't include tests - they're going to be part of a separate patch series. Patches have been rebased, retested and bootstrapped on aarch64-linux-gcc. OK for trunk? Richard Sandiford (1): gomp: Various fixes for SVE types [PR101018] Tejas Belagod (2): Add function to strip pointer type and get down to the actual pointee type. AArch64: Diagnose OpenMP offloading when SVE types involved. gcc/config/aarch64/aarch64-sve-builtins.cc | 37 - gcc/fold-const.cc | 7 +++ gcc/gimplify.cc| 60 ++ gcc/omp-low.cc | 2 +- gcc/poly-int.h | 19 +++ gcc/target.h | 37 - gcc/tree.h | 9 7 files changed, 157 insertions(+), 14 deletions(-)
Re: [PATCH v2] RISC-V: Tweak testcase for PIE
Push to trunk since the CI is green :) On Tue, Apr 1, 2025 at 3:43 PM Kito Cheng wrote: > > Linux toolchain may configured with --enable-default-pie, and that will > cause lots of regression test failures because the function name will > append with @plt suffix (e.g. `call foo` become `call foo@plt`), also > some code generation will different due to the code model like the address > generation for global variable, so we may add -fno-pie to those > testcases to prevent that. > > We may consider just drop @plt suffix to prevent that at all, because > it's not difference between w/ and w/o @plt suffix, the linker will pick > the right one to do, however it's late stage of GCC development, so just > tweak the testcase should be the best way to do now. > > Changes from v1: > - Add more testcase for PIE (from rvv.exp). > - Tweak the rule for match @plt. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rv32i_zcmp.c: Tweak testcase for PIE. > * gcc.target/riscv/rv32e_zcmp.c: Likewise. > * gcc.target/riscv/zcmp_stack_alignment.c: Likewise. > * gcc.target/riscv/cm_mv_rv32.c: Likewise. > * gcc.target/riscv/cpymem-64.c: Likewise. > * gcc.target/riscv/fmax-snan.c: Likewise. > * gcc.target/riscv/fmaxf-snan.c: Likewise. > * gcc.target/riscv/fmin-snan.c: Likewise. > * gcc.target/riscv/fminf-snan.c: Likewise. > * gcc.target/riscv/large-model.c: Likewise. > * gcc.target/riscv/predef-1.c: Likewise. > * gcc.target/riscv/predef-4.c: Likewise. > * gcc.target/riscv/predef-7.c: Likewise. > * gcc.target/riscv/predef-9.c: Likewise. > * gcc.target/riscv/rvv/base/abi-callee-saved-2-save-restore.c: > Likewise. > * gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c: Likewise. > * gcc.target/riscv/rvv/base/abi-callee-saved-2.c: Likewise. > * gcc.target/riscv/rvv/base/cmpmem-1.c: Likewise. > * gcc.target/riscv/rvv/base/cmpmem-3.c: Likewise. > * gcc.target/riscv/rvv/base/cmpmem-4.c: Likewise. > * gcc.target/riscv/rvv/base/cpymem-1.c: Likewise. > * gcc.target/riscv/rvv/base/movmem-1.c: Likewise. > * gcc.target/riscv/rvv/base/pr114352-3.c: Likewise. > * gcc.target/riscv/rvv/base/setmem-1.c: Likewise. > * gcc.target/riscv/rvv/base/setmem-2.c: Likewise. > * gcc.target/riscv/rvv/base/setmem-3.c: Likewise. > * gcc.target/riscv/rvv/base/spill-9.c: Likewise. > * g++.target/riscv/mv-symbols1.C: Likewise. > * g++.target/riscv/mv-symbols3.C: Likewise. > * g++.target/riscv/mv-symbols4.C: Likewise. > * g++.target/riscv/mv-symbols5.C: Likewise. > * g++.target/riscv/mvc-symbols1.C: Likewise. > * g++.target/riscv/mvc-symbols3.C: Likewise. > --- > gcc/testsuite/g++.target/riscv/mv-symbols1.C| 4 ++-- > gcc/testsuite/g++.target/riscv/mv-symbols3.C| 4 ++-- > gcc/testsuite/g++.target/riscv/mv-symbols4.C| 4 ++-- > gcc/testsuite/g++.target/riscv/mv-symbols5.C| 4 ++-- > gcc/testsuite/g++.target/riscv/mvc-symbols1.C | 4 ++-- > gcc/testsuite/g++.target/riscv/mvc-symbols3.C | 4 ++-- > gcc/testsuite/gcc.target/riscv/cm_mv_rv32.c | 4 ++-- > gcc/testsuite/gcc.target/riscv/cpymem-64.c | 4 ++-- > gcc/testsuite/gcc.target/riscv/fmax-snan.c | 2 +- > gcc/testsuite/gcc.target/riscv/fmaxf-snan.c | 2 +- > gcc/testsuite/gcc.target/riscv/fmin-snan.c | 2 +- > gcc/testsuite/gcc.target/riscv/fminf-snan.c | 2 +- > gcc/testsuite/gcc.target/riscv/large-model.c| 2 +- > gcc/testsuite/gcc.target/riscv/predef-1.c | 2 +- > gcc/testsuite/gcc.target/riscv/predef-4.c | 2 +- > gcc/testsuite/gcc.target/riscv/predef-7.c | 2 +- > gcc/testsuite/gcc.target/riscv/predef-9.c | 2 +- > gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c | 6 +++--- > gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c | 6 +++--- > .../riscv/rvv/base/abi-callee-saved-2-save-restore.c| 6 +++--- > .../gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c | 6 +++--- > .../gcc.target/riscv/rvv/base/abi-callee-saved-2.c | 6 +++--- > gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-1.c | 4 ++-- > gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-3.c | 4 ++-- > gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-4.c | 4 ++-- > gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c | 4 ++-- > gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c | 2 +- > gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-3.c| 4 ++-- > gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c | 2 +- > gcc/testsuite/gcc.target/riscv/rvv/base/setmem-2.c | 2 +- > gcc/testsuite/gcc.target/riscv/rvv/
Re: [PATCH] [testsuite] [riscv] limit vwaddsub-1.c to rv64
On Tue, Apr 1, 2025 at 12:47 PM Jeff Law wrote: > > > > On 3/31/25 7:03 PM, Alexandre Oliva wrote: > > On Mar 31, 2025, Jeff Law wrote: > >> I don't immediately see anything in this test or its history to > >> indicate it's only supposed to work for rv64. > > > > It's the 64-bit integral argument rs1. > Right, but ISTM we ought to be able to handle a vector of 64bit integral > types, especially from the intrinsics interface. > > Probably the thing to do is see what the intrinsics docs say. Or maybe > Kito knows offhand. Long story, fortunately we have some record on that: https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/9 https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/199 > > > > > > On rv64, it's in a single 64-bit register, that needs to be narrowed to > > 32 bits and then sign extended to 64 bits to fit the semantics of > > vwadd.vx. > Ah. A vx form. That's the key here. Indeed I don't think we can have > a 64bit argument for the scalar argument in a .vx form insn. > > So objection cleared. OK for the trunk. > jeff > >
Re: [PATCH] Further use of mod_scope in modified_type_die
> "Richard" == Richard Biener writes: Richard> On Fri, Mar 28, 2025 at 8:32 AM Eric Botcazou wrote: >> >> > Let's involve Eric here, I have no idea what was intended here. >> >> We discussed it with Tom internally and he makes the final decisions here. Richard> Well the, Tom, go ahead. Thanks. I made the change you suggested up-thread, and re-tested the patch. I'm going to check it in now. Tom
Re: [PATCH v2] RISC-V: vsetvl: skip abnormal edge on vsetvl insertion [PR119533]
On 3/31/25 23:48, Heinrich Schuchardt wrote: > On 3/30/25 01:49, Vineet Gupta wrote: >> changes since v2 >> - dump log sanfu >> >> --- >> vsetvl phase4 uses LCM guided info to insert VSETVL insns. >> It has an additional loop to insert missing vsetvls on certain edges. >> Currently it asserts/aborts on encountering EDGE_ABNORMAL. >> When enabling go frontend with V enabled, libgo build hits the assert. >> >> It seems to be safe to just skip the abnormal edge. > > Hello Vineet, > > Is there a test case where only following an abnormal edge between code > blocks would require to call VSETVL? > > In the sketched code below following the exception would require VSETVL > to be called while the edge from the try block to the finally block > would not require this. > > try { > for (..) { > uint16_t vectorizable math > if (condition) > throw exception > uint16_t vectorizable math > } > for (..) { > uint8_t vectorizable math > } > } catch exception { > } finally > for (..) { > uint8_t vectorizable math > } > } Yeah we are going to run testsuite with -fnon-call-exceptions to find such cases. But I'd argue, there is no need to optimize vsetvl for such esoteric cases (vs. code complexity trade-off). After all we'd just endup with an extra VSETVL in the finally block. Thx, -Vineet
Re: [PATCH v2] c++: fix missing lifetime extension [PR119383]
On 3/31/25 4:59 PM, Marek Polacek wrote: On Thu, Mar 27, 2025 at 11:27:04AM -0400, Jason Merrill wrote: On 3/25/25 3:37 PM, Marek Polacek wrote: Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14? -- >8 -- Since r15-8011 cp_build_indirect_ref_1 won't do the *&TARGET_EXPR -> TARGET_EXPR folding not to change its value category. That fix is correct but it made us stop extending the lifetime in this testcase, causing a wrong-code issue -- extend_ref_init_temps_1 did not see through the extra *& because it doesn't use a tree walk. It is not hard to fix that, but there may be other places that need this adjustment. :/ Hmm, this suggests to me that we should revert the 117512 patch and instead fix it in build_over_call, perhaps by forcing 'val' to be an lvalue after it's built rather than relying on 'to' being an lvalue already. Thanks for the suggestion. Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14? @@ -3870,13 +3870,11 @@ cp_build_indirect_ref_1 (location_t loc, tree ptr, ref_operator errorstring, return error_mark_node; } else if (do_fold && TREE_CODE (pointer) == ADDR_EXPR - && same_type_p (t, TREE_TYPE (TREE_OPERAND (pointer, 0))) - /* Don't let this change the value category. '*&TARGET_EXPR' - is an lvalue, but folding it into 'TARGET_EXPR' would turn - it into a prvalue of class type. */ - && lvalue_p (TREE_OPERAND (pointer, 0))) + && same_type_p (t, TREE_TYPE (TREE_OPERAND (pointer, 0 /* The POINTER was something like `&x'. We simplify `*&x' to - `x'. */ + `x'. This change the value category: '*&TARGET_EXPR' "This might change" or "can change". OK with that tweak. Jason
[COMMITTED 26/35] gccrs: Fix ICE during const expr eval on array expressions
From: Philip Herron Array expressions are still getting turned into VIEW_CONVERT_EXPR's becuase TYPE_MAIN_VARIANT is not set so then we might as well reuse the type-hasher to sort this out. Fixes Rust-GCC#3588 gcc/rust/ChangeLog: * backend/rust-compile-context.h: only push named types * backend/rust-compile-type.cc (TyTyResolveCompile::visit): run the type hasher gcc/testsuite/ChangeLog: * rust/compile/issue-3588.rs: New test. Signed-off-by: Philip Herron --- gcc/rust/backend/rust-compile-context.h | 5 - gcc/rust/backend/rust-compile-type.cc| 2 ++ gcc/testsuite/rust/compile/issue-3588.rs | 5 + 3 files changed, 11 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/rust/compile/issue-3588.rs diff --git a/gcc/rust/backend/rust-compile-context.h b/gcc/rust/backend/rust-compile-context.h index a44638817db..ce81a1d0db2 100644 --- a/gcc/rust/backend/rust-compile-context.h +++ b/gcc/rust/backend/rust-compile-context.h @@ -72,7 +72,10 @@ public: return it->second; compiled_type_map.insert ({h, type}); -push_type (type); + +if (TYPE_NAME (type) != NULL) + push_type (type); + return type; } diff --git a/gcc/rust/backend/rust-compile-type.cc b/gcc/rust/backend/rust-compile-type.cc index 813e11c47cf..83e5756429f 100644 --- a/gcc/rust/backend/rust-compile-type.cc +++ b/gcc/rust/backend/rust-compile-type.cc @@ -481,6 +481,8 @@ TyTyResolveCompile::visit (const TyTy::ArrayType &type) tree folded_capacity_expr = fold_expr (capacity_expr); translated = Backend::array_type (element_type, folded_capacity_expr); + if (translated != error_mark_node) +translated = ctx->insert_compiled_type (translated); } void diff --git a/gcc/testsuite/rust/compile/issue-3588.rs b/gcc/testsuite/rust/compile/issue-3588.rs new file mode 100644 index 000..744d9671c42 --- /dev/null +++ b/gcc/testsuite/rust/compile/issue-3588.rs @@ -0,0 +1,5 @@ +const FOO: i32 = if true { [1, 2, 3] } else { [2, 3, 4] }[0]; + +pub fn test() -> i32 { +FOO +} -- 2.49.0