Re: [PATCH 1/2] LoongArch: Fix wrong code with _alsl_reversesi_extended
On Wed, 2025-01-22 at 10:53 +0800, Xi Ruoyao wrote: > On Wed, 2025-01-22 at 10:37 +0800, Lulu Cheng wrote: > > > > 在 2025/1/22 上午8:49, Xi Ruoyao 写道: > > > The second source register of this insn cannot be the same as the > > > destination register. > > > > > > gcc/ChangeLog: > > > > > > * config/loongarch/loongarch.md > > > (_alsl_reversesi_extended): Add '&' to the destination > > > register constraint and append '0' to the first source register > > > constraint to indicate the destination register cannot be same > > > as the second source register, and change the split condition to > > > reload_completed so that the insn will be split only after RA in > > > order to obtain allocated registers that satisfy the above > > > constraints. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.target/loongarch/bitwise-shift-reassoc-clobber.c: New > > > test. > > > --- > > > gcc/config/loongarch/loongarch.md | 6 +++--- > > > .../loongarch/bitwise-shift-reassoc-clobber.c | 21 +++ > > > 2 files changed, 24 insertions(+), 3 deletions(-) > > > create mode 100644 > > > gcc/testsuite/gcc.target/loongarch/bitwise-shift-reassoc-clobber.c > > > > > > diff --git a/gcc/config/loongarch/loongarch.md > > > b/gcc/config/loongarch/loongarch.md > > > index 223e2b9f37f..1392325038c 100644 > > > --- a/gcc/config/loongarch/loongarch.md > > > +++ b/gcc/config/loongarch/loongarch.md > > > @@ -3160,13 +3160,13 @@ (define_insn_and_split > > > "_shift_reverse" > > > ;; add.w => alsl.w, so implement slli.d + and + add.w => and + alsl.w on > > > ;; our own. > > > (define_insn_and_split "_alsl_reversesi_extended" > > > - [(set (match_operand:DI 0 "register_operand" "=r") > > > + [(set (match_operand:DI 0 "register_operand" "=&r") > > > (sign_extend:DI > > > (plus:SI > > > (subreg:SI > > > (any_bitwise:DI > > > (ashift:DI > > > - (match_operand:DI 1 "register_operand" "r") > > > + (match_operand:DI 1 "register_operand" "r0") > > > (match_operand:SI 2 "const_immalsl_operand" "")) > > > (match_operand:DI 3 "const_int_operand" "i")) > > > 0) > > > @@ -3175,7 +3175,7 @@ (define_insn_and_split > > > "_alsl_reversesi_extended" > > > && loongarch_reassoc_shift_bitwise (, operands[2], > > > operands[3], > > > SImode)" > > > "#" > > > - "&& true" > > > + "&& reload_completed" > > > > I have no problem with this patch. > > > > But, I have always been confused about the use of reload_completed. > > > > I can understand that it needs to be true here, but I don't quite > > understand the following: > > > > ``` > > > > (define_insn_and_split "*zero_extendsidi2_internal" > > [(set (match_operand:DI 0 "register_operand" "=r,r,r,r") > > (zero_extend:DI (match_operand:SI 1 "nonimmediate_operand" > > "r,m,ZC,k")))] > > "TARGET_64BIT" > > "@ > > bstrpick.d\t%0,%1,31,0 > > ld.wu\t%0,%1 > > # > > ldx.wu\t%0,%1" > > "&& reload_completed > > I don't really understand it either... It is here since day 1 LoongArch > support was added and I've never had enough courage to hack this part. I pushed the 1st patch as an emergency wrong-code fix now. The 2nd patch can wait until we figure out the best way to support the fusion. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] c++/modules: Fix linkage checks for exported using-decls
On 21/01/2025 21:58, Jason Merrill wrote: On 1/15/25 7:36 PM, yxj-github-437 wrote: On Fri, Jan 03, 2025 at 05:18:55PM +, xxx wrote: From: yxj-github-437 <2457369...@qq.com> This patch attempts to fix an error when build module std. The reason for the error is __builrin_va_list (aka struct __va_list) is an internal linkage. so attempt to handle this builtin type by identifying whether DECL_SOURCE_LOCATION (entity) is BUILTINS_LOCATION. Hi, thanks for the patch! I suspect this may not be sufficient to completely avoid issues with the __gnuc_va_list type; in particular, if it's internal linkage that may prevent it from being referred to in other ways by inline functions in named modules (due to P1815). Maybe a better approach would be to instead mark this builtin type as TREE_PUBLIC (presumably in aarch64_build_builtin_va_list)? Thanks, I change my patch to mark TREE_PUBLIC. Looks good to me if the ARM maintainers don't object. Looks OK to me. R. This patch is small enough not to worry about copyright, but "yxj-github-437 <2457369...@qq.com>" seems like a placeholder name, what name would you like the commit to use? -- >8 -- This patch attempts to fix an error when build module std. The reason for the error is __builtin_va_list (aka struct __va_list) has internal linkage. so mark this builtin type as TREE_PUBLIC to make struct __va_list has external linkage. /x/gcc-15.0.0/usr/bin/aarch64-linux-android-g++ -fmodules -std=c++23 - fPIC -O2 -fsearch-include-path bits/std.cc -c /x/gcc-15.0.0/usr/lib/gcc/aarch64-linux-android/15.0.0/include/c++/ bits/std.cc:3642:14: error: exporting ‘typedef __gnuc_va_list va_list’ that does not have external linkage 3642 | using std::va_list; | ^~~ : note: ‘struct __va_list’ declared here with internal linkage gcc * config/aarch64/aarch64.cc (aarch64_build_builtin_va_list): mark __builtin_va_list as TREE_PUBLIC * config/arm/arm.cc (arm_build_builtin_va_list): mark __builtin_va_list as TREE_PUBLIC * testsuite/g++.dg/modules/builtin-8.C: New test --- gcc/config/aarch64/aarch64.cc | 1 + gcc/config/arm/arm.cc | 1 + gcc/testsuite/g++.dg/modules/builtin-8.C | 9 + 3 files changed, 11 insertions(+) create mode 100644 gcc/testsuite/g++.dg/modules/builtin-8.C diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/ aarch64.cc index ad31e9d255c..e022526e573 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -21566,6 +21566,7 @@ aarch64_build_builtin_va_list (void) get_identifier ("__va_list"), va_list_type); DECL_ARTIFICIAL (va_list_name) = 1; + TREE_PUBLIC (va_list_name) = 1; TYPE_NAME (va_list_type) = va_list_name; TYPE_STUB_DECL (va_list_type) = va_list_name; diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc index 1e0791dc8c2..86838ebde5f 100644 --- a/gcc/config/arm/arm.cc +++ b/gcc/config/arm/arm.cc @@ -2906,6 +2906,7 @@ arm_build_builtin_va_list (void) get_identifier ("__va_list"), va_list_type); DECL_ARTIFICIAL (va_list_name) = 1; + TREE_PUBLIC (va_list_name) = 1; TYPE_NAME (va_list_type) = va_list_name; TYPE_STUB_DECL (va_list_type) = va_list_name; /* Create the __ap field. */ diff --git a/gcc/testsuite/g++.dg/modules/builtin-8.C b/gcc/testsuite/ g++.dg/modules/builtin-8.C new file mode 100644 index 000..ff91104e4a9 --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/builtin-8.C @@ -0,0 +1,9 @@ +// { dg-additional-options -fmodules-ts } +module; +#include +export module builtins; +// { dg-module-cmi builtins } + +export { + using ::va_list; +}
Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures
Hello, On Tue, 21 Jan 2025, Martin Uecker wrote: > > > Coudn't you use the rule that .len refers to the closest enclosing > > > structure > > > even without __self__ ? This would then also disambiguate between > > > designators > > > and other uses. > > > > Right now, an expression cannot start with '.', which provides the > > disambiguation between designators and expressions as initializers. > > You could disambiguate directly after parsing the identifier, which > does not seem overly problematic. Which way? When you allow ". identifier" as primary expression, then in struct S s = { .x = 42 }; the initializer can be parsed as designated initializer (with error when 'x' is not a member of S) or as assignment expression like in struct T t = { foo = 42 }; You need to decide which is which after seeing the ".". I'm guessing what you mean is that on seeing ".ident" as first two tokens inside in initializer-list you go the designator route, and not the initializer/assignment-expression route, even though the latter can now also start with ".ident". But then, what about: struct U u = { .y }; It's certainly not a designation anymore, but you only know after not seeing the '='. So here it's actually an assignment-expression. And as you allowed ".ident" as primary expression this might rightfully refer to some in-scope 'y' member of some outer struct (or give an error). Note further that you may have '{ .y[1][3].z }', which is still not a designation, but an expression under your proposal, whereas '{ .y[1][3].z = 1 }' would remain a designation. This shows that you now need arbitrary look-ahead to disambiguate the two. A Very Bad Idea. Ciao, Michael.
[PATCH] c++/modules: Fix exporting temploid friends in header units [PR118582]
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? -- >8 -- When we started streaming the bit to handle merging of imported temploid friends in r15-2807, I unthinkingly only streamed it in the '!state->is_header ()' case. This patch reworks the streaming logic to ensure that this data is always streamed, including for unique entities (in case that ever comes up somehow). This does make the streaming slightly less efficient, as functions and types will need an extra byte, but this doesn't appear to make a huge difference to the size of the resulting module; the 'std' module on my machine grows by 0.2% from 30671136 to 30730144 bytes. PR c++/118582 gcc/cp/ChangeLog: * module.cc (trees_out::decl_value): Always stream imported_temploid_friends information. (trees_in::decl_value): Likewise. gcc/testsuite/ChangeLog: * g++.dg/modules/pr118582_a.H: New test. * g++.dg/modules/pr118582_b.H: New test. * g++.dg/modules/pr118582_c.H: New test. Signed-off-by: Nathaniel Shead --- gcc/cp/module.cc | 47 +++ gcc/testsuite/g++.dg/modules/pr118582_a.H | 16 gcc/testsuite/g++.dg/modules/pr118582_b.H | 6 +++ gcc/testsuite/g++.dg/modules/pr118582_c.H | 5 +++ 4 files changed, 49 insertions(+), 25 deletions(-) create mode 100644 gcc/testsuite/g++.dg/modules/pr118582_a.H create mode 100644 gcc/testsuite/g++.dg/modules/pr118582_b.H create mode 100644 gcc/testsuite/g++.dg/modules/pr118582_c.H diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index 813c1436141..17215594fd3 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -2791,7 +2791,7 @@ static keyed_map_t *keyed_table; /* Instantiations of temploid friends imported from another module need to be attached to the same module as the temploid. This maps - these decls to the temploid they are instantiated them, as there is + these decls to the temploid they are instantiated from, as there is no other easy way to get this information. */ static GTY((cache)) decl_tree_cache_map *imported_temploid_friends; @@ -7961,7 +7961,6 @@ trees_out::decl_value (tree decl, depset *dep) } merge_kind mk = get_merge_kind (decl, dep); - bool is_imported_temploid_friend = imported_temploid_friends->get (decl); if (CHECKING_P) { @@ -7998,10 +7997,6 @@ trees_out::decl_value (tree decl, depset *dep) is_attached = true; bits.b (is_attached); - - /* Also tell the importer whether this is an imported temploid -friend, which has implications for merging. */ - bits.b (is_imported_temploid_friend); } bits.b (dep && dep->has_defn ()); } @@ -8087,6 +8082,16 @@ trees_out::decl_value (tree decl, depset *dep) tree container = decl_container (decl); unsigned tpl_levels = 0; + /* Also tell the importer whether this is a temploid friend attached + to a different module (which has implications for merging), so that + importers can reconstruct this information on stream-in. */ + if (TREE_CODE (inner) == FUNCTION_DECL || TREE_CODE (inner) == TYPE_DECL) +{ + tree* temploid_friend_slot = imported_temploid_friends->get (decl); + gcc_checking_assert (!temploid_friend_slot || *temploid_friend_slot); + tree_node (temploid_friend_slot ? *temploid_friend_slot : NULL_TREE); +} + { auto wmk = make_temp_override (dep_hash->writing_merge_key, true); if (decl != inner) @@ -8182,14 +8187,6 @@ trees_out::decl_value (tree decl, depset *dep) } } - if (is_imported_temploid_friend) -{ - /* Write imported temploid friends so that importers can reconstruct -this information on stream-in. */ - tree* slot = imported_temploid_friends->get (decl); - tree_node (*slot); -} - bool is_typedef = false; if (!type && TREE_CODE (inner) == TYPE_DECL) { @@ -8266,7 +8263,6 @@ trees_in::decl_value () { int tag = 0; bool is_attached = false; - bool is_imported_temploid_friend = false; bool has_defn = false; unsigned mk_u = u (); if (mk_u >= MK_hwm || !merge_kind_name[mk_u]) @@ -8287,10 +8283,7 @@ trees_in::decl_value () { bits_in bits = stream_bits (); if (!(mk & MK_template_mask) && !state->is_header ()) - { - is_attached = bits.b (); - is_imported_temploid_friend = bits.b (); - } + is_attached = bits.b (); has_defn = bits.b (); } @@ -8385,6 +8378,12 @@ trees_in::decl_value () tree container = decl_container (); unsigned tpl_levels = 0; + /* If this is an imported temploid friend, get the owning decl its + attachment is determined by (or NULL_TREE otherwise). */ + tree temploid_friend = NULL_TREE; + if (TREE_CODE (inner) == FUNCTION_DECL || TREE_CODE (inner) == TYPE_DECL) +temploid_friend = tree_node (); + /* Figure out i
[PATCH] LoongArch: Fix invalid subregs in xorsign [PR118501]
The test case added in r15-7073 now triggers an ICE, indicating we need the same fix as AArch64. gcc/ChangeLog: PR target/118501 * config/loongarch/loongarch.md (@xorsign3): Use force_lowpart_subreg. --- Bootstrapped and regtested on loongarch64-linux-gnu, ok for trunk? gcc/config/loongarch/loongarch.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 0325994ebd6..04a9a79d548 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -1343,8 +1343,8 @@ (define_expand "@xorsign3" machine_mode lsx_mode = mode == SFmode ? V4SFmode : V2DFmode; rtx tmp = gen_reg_rtx (lsx_mode); - rtx op1 = lowpart_subreg (lsx_mode, operands[1], mode); - rtx op2 = lowpart_subreg (lsx_mode, operands[2], mode); + rtx op1 = force_lowpart_subreg (lsx_mode, operands[1], mode); + rtx op2 = force_lowpart_subreg (lsx_mode, operands[2], mode); emit_insn (gen_xorsign3 (lsx_mode, tmp, op1, op2)); emit_move_insn (operands[0], lowpart_subreg (mode, tmp, lsx_mode)); -- 2.48.1
Re: [PATCH 3/3] aarch64: Avoid redundant writes to FPMR
> On 22 Jan 2025, at 13:53, Richard Sandiford wrote: > > Kyrylo Tkachov writes: >> Hi Richard, >> >>> On 22 Jan 2025, at 13:21, Richard Sandiford >>> wrote: >>> >>> GCC 15 is the first release to support FP8 intrinsics. >>> The underlying instructions depend on the value of a new register, >>> FPMR. Unlike FPCR, FPMR is a normal call-clobbered/caller-save >>> register rather than a global register. So: >>> >>> - The FP8 intrinsics take a final uint64_t argument that >>> specifies what value FPMR should have. >>> >>> - If an FP8 operation is split across multiple functions, >>> it is likely that those functions would have a similar argument. >>> >>> If the object code has the structure: >>> >>> for (...) >>> fp8_kernel (..., fpmr_value); >>> >>> then fp8_kernel would set FPMR to fpmr_value each time it is >>> called, even though FPMR will already have that value for at >>> least the second and subsequent calls (and possibly the first). >>> >>> The working assumption for the ABI has been that writes to >>> registers like FPMR can in general be more expensive than >>> reads and so it would be better to use a conditional write like: >>> >>> mrs tmp, fpmr >>> cmp tmp, >>> beq 1f >>> nsr fpmr, >> >> Typo “msr” here and in the comment in the code. > > Oops, thanks, will fix. > >> [...] >>> @@ -1883,6 +1884,44 @@ (define_split >>> } >>> ) >>> >>> +;; The preferred way of writing to the FPMR is to test whether it already >>> +;; has the desired value and branch around the write if so. This reduces >>> +;; the number of redundant FPMR writes caused by ABI boundaries, such as >>> in: >>> +;; >>> +;;for (...) >>> +;; fp8_kernel (..., fpmr_value); >>> +;; >>> +;; Without this optimization, fp8_kernel would set FPMR to fpmr_value each >>> +;; time that it is called. >>> +;; >>> +;; We do this as a split so that hardreg_pre can optimize the moves first. >>> +(define_split >>> + [(set (reg:DI FPM_REGNUM) >>> +(match_operand:DI 0 "aarch64_reg_or_zero"))] >>> + "TARGET_FP8 && !TARGET_CHEAP_FPMR_WRITE && can_create_pseudo_p ()" >>> + [(const_int 0)] >>> + { >>> +auto label = gen_label_rtx (); >>> +rtx current = copy_to_reg (gen_rtx_REG (DImode, FPM_REGNUM)); >>> +rtx cond = gen_rtx_EQ (VOIDmode, current, operands[0]); >>> +emit_jump_insn (gen_cbranchdi4 (cond, current, operands[0], label)); >> >> Do you think it’s worth marking this jump as likely? >> In some other expand code in the backend where we emit jumps we sometimes >> use aarch64_emit_unlikely_jump. > > Ah, yeah, I should have said that I'd wondered about that. But in the > end it didn't seem appropriate. Given that hardreg_pre should remove > local instances of redundancy, we don't really have much information > about whether the branch is likely or unlikely. I think instead the > hope/expectation is that the branch has a predictable pattern. Ok, thanks for clarifying. Kyrill > > Thanks, > Richard
[pushed: r15-7126] jit: fix startup on aarch64
libgccjit fails on startup on aarch64 (and probably other archs). The issues are that (a) within jit_langhook_init the call to targetm.init_builtins can use types that aren't representable via jit::recording::type, and (b) targetm.init_builtins can call lang_hooks.decls.pushdecl, which although a no-op for libgccjit has a gcc_unreachable. Fixed thusly. Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to trunk as r15-7126-g27470f9a818538. gcc/jit/ChangeLog: * dummy-frontend.cc (tree_type_to_jit_type): For POINTER_TYPE, bail out if the inner call to tree_type_to_jit_type fails. Don't abort on unknown types. (jit_langhook_pushdecl): Replace gcc_unreachable with return of NULL_TREE. Signed-off-by: David Malcolm --- gcc/jit/dummy-frontend.cc | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/gcc/jit/dummy-frontend.cc b/gcc/jit/dummy-frontend.cc index 574851696311..1d0080d6fecb 100644 --- a/gcc/jit/dummy-frontend.cc +++ b/gcc/jit/dummy-frontend.cc @@ -1278,6 +1278,8 @@ recording::type* tree_type_to_jit_type (tree type) { tree inner_type = TREE_TYPE (type); recording::type* element_type = tree_type_to_jit_type (inner_type); +if (!element_type) + return nullptr; return element_type->get_pointer (); } else @@ -1299,10 +1301,6 @@ recording::type* tree_type_to_jit_type (tree type) } } } - -fprintf (stderr, "Unknown type:\n"); -debug_tree (type); -abort (); } return NULL; @@ -1372,7 +1370,7 @@ jit_langhook_global_bindings_p (void) static tree jit_langhook_pushdecl (tree decl ATTRIBUTE_UNUSED) { - gcc_unreachable (); + return NULL_TREE; } static tree -- 2.26.3
Please Ignore -- testing email
Apologies for the noise.
Re: [PATCH v2 01/12] OpenMP/PolyInt: Pass poly-int structures by address to OMP libs.
On 1/21/25 10:16 PM, Jakub Jelinek wrote: On Fri, Oct 18, 2024 at 11:52:22AM +0530, Tejas Belagod wrote: Currently poly-int type structures are passed by value to OpenMP runtime functions for shared clauses etc. This patch improves on this by passing around poly-int structures by address to avoid copy-overhead. gcc/ChangeLog * omp-low.c (use_pointer_for_field): Use pointer if the OMP data structure's field type is a poly-int. I think I've acked this one earlier already. It is still ok. Thanks Jakub for the reviews. Just to clarify - this series is all now for Stage 1, I'm guessing? Thanks, Tejas. --- gcc/omp-low.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index da2051b0279..6b3853ed528 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -466,7 +466,8 @@ static bool use_pointer_for_field (tree decl, omp_context *shared_ctx) { if (AGGREGATE_TYPE_P (TREE_TYPE (decl)) - || TYPE_ATOMIC (TREE_TYPE (decl))) + || TYPE_ATOMIC (TREE_TYPE (decl)) + || POLY_INT_CST_P (DECL_SIZE (decl))) return true; /* We can only use copy-in/copy-out semantics for shared variables -- 2.25.1 Jakub
Re: [pushed: r15-7126] jit: fix startup on aarch64
Hi David. I had a patch for this here: https://github.com/antoyo/libgccjit/pull/20 The fact that you removed the debug_tree (and abort) will make it harder to figure out what the missing types to handle are. This will also probably make it hard for people to understand why they get a type error when calling a builtin function with an unsupported type. And as you can see in my PR, at least a few types were missing that you didn't add in your patch. Do you have a better solution for this? I just thought about this potential solution: perhaps if we get an unsupported type, we could add the builtin to an array instead of the hashmap: this way, we could tell the user that this builtin is not currently supported. What are your thoughts on this? Thanks. Le 2025-01-22 à 08 h 38, David Malcolm a écrit : libgccjit fails on startup on aarch64 (and probably other archs). The issues are that (a) within jit_langhook_init the call to targetm.init_builtins can use types that aren't representable via jit::recording::type, and (b) targetm.init_builtins can call lang_hooks.decls.pushdecl, which although a no-op for libgccjit has a gcc_unreachable. Fixed thusly. Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to trunk as r15-7126-g27470f9a818538. gcc/jit/ChangeLog: * dummy-frontend.cc (tree_type_to_jit_type): For POINTER_TYPE, bail out if the inner call to tree_type_to_jit_type fails. Don't abort on unknown types. (jit_langhook_pushdecl): Replace gcc_unreachable with return of NULL_TREE. Signed-off-by: David Malcolm --- gcc/jit/dummy-frontend.cc | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/gcc/jit/dummy-frontend.cc b/gcc/jit/dummy-frontend.cc index 574851696311..1d0080d6fecb 100644 --- a/gcc/jit/dummy-frontend.cc +++ b/gcc/jit/dummy-frontend.cc @@ -1278,6 +1278,8 @@ recording::type* tree_type_to_jit_type (tree type) { tree inner_type = TREE_TYPE (type); recording::type* element_type = tree_type_to_jit_type (inner_type); +if (!element_type) + return nullptr; return element_type->get_pointer (); } else @@ -1299,10 +1301,6 @@ recording::type* tree_type_to_jit_type (tree type) } } } - -fprintf (stderr, "Unknown type:\n"); -debug_tree (type); -abort (); } return NULL; @@ -1372,7 +1370,7 @@ jit_langhook_global_bindings_p (void) static tree jit_langhook_pushdecl (tree decl ATTRIBUTE_UNUSED) { - gcc_unreachable (); + return NULL_TREE; } static tree
Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures
Hello Michael, Am Mittwoch, dem 22.01.2025 um 16:54 +0100 schrieb Michael Matz: > On Wed, 22 Jan 2025, Martin Uecker wrote: > > > > > So you do not need to look further. But maybe I am missing something > > > > else. > > > > > > Like ... > > > > > > > > Note further that you may have '{ .y[1][3].z }', which is still not a > > > > > designation, but an expression under your proposal, whereas > > > > > '{ .y[1][3].z = 1 }' would remain a designation. This shows that you > > > > > now need arbitrary look-ahead to disambiguate the two. A Very Bad > > > > > Idea. > > > > > > ... this? > > > > In .y[1][3].z after .y you can decide whether y is a member of the > > struct being initialized. If it is, it is a designator and if not > > it must be an expression. > > If y is not a member it must be an expression, true. But if it's a member > you don't know, it may be a designation or an expression. In an initializer I know all the members. Martin
Re: [PATCH] c++, v2: Implement for static locals CWG 2867 - Order of initialization for structured bindings [PR115769]
On 9/6/24 8:02 AM, Jakub Jelinek wrote: Hi! On Wed, Aug 14, 2024 at 06:11:35PM +0200, Jakub Jelinek wrote: Here is the I believe ABI compatible version, which uses the separate guard variables, so different structured binding variables can be initialized in different threads, but the thread that did the artificial base initialization will keep temporaries live at least until the last guard variable is released (i.e. when even that variable has been initialized). Bootstrapped/regtested on x86_64-linux and i686-linux on top of the https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660354.html patch, ok for trunk? As for namespace scope structured bindings and this DR, all of set_up_extended_ref_temp, cp_finish_decl -> expand_static_init and cp_finish_decl -> cp_finish_decomp -> cp_finish_decl -> expand_static_init in that case just push some decls into the static_aggregates or tls_aggregates chains. So, we can end up e.g. with the most important decl for a extended ref temporary (which initializes some temporaries), then perhaps some more of those, then DECL_DECOMPOSITION_P base, then n times optionally some further extended refs and DECL_DECOMPOSITION_P non-base and I think we need to one_static_initialization_or_destruction all of them together, by omitting CLEANUP_POINT_EXPR from the very first one (or all until the DECL_DECOMPOSITION_P base?), say through temporarily clearing stmts_are_full_exprs_p and then wrapping whatever one_static_initialization_or_destruction produces for all of those into a single CLEANUP_POINT_EXPR argument. Perhaps remember static_aggregates or tls_aggregates early before any check_initializer etc. calls and then after cp_finish_decomp cut that TREE_LIST nodes and pass that as a separate TREE_VALUE in the list. Though, not sure what to do about modules.cc uses of these, it needs to save/restore that stuff somehow too. Now that the CWG 2867 patch for automatic structured bindings is in, here is an updated version of the block scope static structured bindings CWG 2867 patch. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. No patch for the namespace scope structured bindings yet, will work on that soon. 2024-09-05 Jakub Jelinek PR c++/115769 * decl.cc: Partially implement CWG 2867 - Order of initialization for structured bindings. (cp_finish_decl): If need_decomp_init, for function scope structure binding bases, temporarily clear stmts_are_full_exprs_p before calling expand_static_init, after it call cp_finish_decomp and wrap code emitted by both into maybe_cleanup_point_expr_void and ensure cp_finish_decomp isn't called again. * g++.dg/DRs/dr2867-3.C: New test. * g++.dg/DRs/dr2867-4.C: New test. --- gcc/cp/decl.cc.jj 2024-09-04 19:55:59.046491602 +0200 +++ gcc/cp/decl.cc 2024-09-04 20:04:35.695952219 +0200 @@ -9140,7 +9140,24 @@ cp_finish_decl (tree decl, tree init, bo initializer. It is not legal to redeclare a static data member, so this issue does not arise in that case. */ else if (var_definition_p && TREE_STATIC (decl)) - expand_static_init (decl, init); + { + if (decomp && DECL_FUNCTION_SCOPE_P (decl)) + { + tree sl = push_stmt_list (); + auto saved_stmts_are_full_exprs_p = stmts_are_full_exprs_p (); + current_stmt_tree ()->stmts_are_full_exprs_p = 0; + expand_static_init (decl, init); + current_stmt_tree ()->stmts_are_full_exprs_p + = saved_stmts_are_full_exprs_p; + cp_finish_decomp (decl, decomp); + decomp = NULL; + sl = pop_stmt_list (sl); + sl = maybe_cleanup_point_expr_void (sl); + add_stmt (sl); + } + else + expand_static_init (decl, init); + } } /* If a CLEANUP_STMT was created to destroy a temporary bound to a --- gcc/testsuite/g++.dg/DRs/dr2867-3.C.jj 2024-08-13 21:05:42.876446125 +0200 +++ gcc/testsuite/g++.dg/DRs/dr2867-3.C 2024-08-13 21:05:42.876446125 +0200 @@ -0,0 +1,159 @@ +// CWG2867 - Order of initialization for structured bindings. +// { dg-do run { target c++11 } } +// { dg-options "" } + +#define assert(X) do { if (!(X)) __builtin_abort(); } while (0) + +namespace std { + template struct tuple_size; + template struct tuple_element; +} + +int a, c, d, i; + +struct A { + A () { assert (c == 3); ++c; } + ~A () { ++a; } + template int &get () const { assert (c == 5 + I); ++c; return i; } +}; + +template <> struct std::tuple_size { static const int value = 4; }; +template struct std::tuple_element { using type = int; }; +template <> struct std::tuple_size { static const int value = 4; }; +template struct std::tuple_element { using type = int; }; + +struct B { + B () { assert (c >= 1 && c <= 2); ++c; } + ~B () { assert (c >= 9 && c <= 10); ++c; } +}; + +struct C { + conste
Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures
Hello, On Wed, 22 Jan 2025, Martin Uecker wrote: > > > In .y[1][3].z after .y you can decide whether y is a member of the > > > struct being initialized. If it is, it is a designator and if not > > > it must be an expression. > > > > If y is not a member it must be an expression, true. But if it's a member > > you don't know, it may be a designation or an expression. > > In an initializer I know all the members. My sentence was ambiguous :-) Trying again: When it's a member, and you know it's a member, then you still don't know if it's going to be a designation or an expression. It can be both. Ciao, Michael.
[PATCH] builtins: Store unspecified value to *exp for inf/nan [PR114877]
Hi! The fold_builtin_frexp folding for NaN/Inf just returned the first argument with evaluating second arguments side-effects, rather than storing something to what the second argument points to. The PR argues that the C standard requires the function to store something there but what exactly is stored is unspecified, so not storing there anything can result in UB if the value isn't initialized and is read later. glibc and newlib store there 0, musl apparently doesn't store anything. The following patch stores there zero (or would you prefer storing there some other value, 42, INT_MAX, INT_MIN, etc.?; zero is cheapest to form in assembly though) and adjusts the test so that it doesn't rely on not storing there anything but instead checks for -Wmaybe-uninitialized warning to find out that something has been stored there. Unfortunately I had to disable the NaN tests for -O0, while we can fold __builtin_isnan (__builtin_nan ("")) at compile time, we can't fold __builtin_isnan ((i = 0, __builtin_nan (""))) at compile time. fold_builtin_classify uses just tree_expr_nan_p and if that isn't true (because expr is a COMPOUND_EXPR with tree_expr_nan_p on the second arg), it does arg = builtin_save_expr (arg); return fold_build2_loc (loc, UNORDERED_EXPR, type, arg, arg); and that isn't folded at -O0 further, as we wrap it into SAVE_EXPR and nothing propagates the NAN to the comparison. I think perhaps tree_expr_nan_p etc. could have case COMPOUND_EXPR: added and recurse on the second argument, but that feels like stage1 material to me if we want to do that at all. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2025-01-22 Jakub Jelinek PR middle-end/114877 * builtins.cc (fold_builtin_frexp): Handle rvc_nan and rvc_inf cases like rvc_zero, return passed in arg and set *exp = 0. * gcc.dg/torture/builtin-frexp-1.c: Add -Wmaybe-uninitialized as dg-additional-options. (bar): New function. (TESTIT_FREXP2): Rework the macro so that it doesn't test whether nothing has been stored to what the second argument points to, but instead that something has been stored there, whatever it is. (main): Temporarily don't enable the nan tests for -O0. --- gcc/builtins.cc.jj 2025-01-22 09:15:41.700176403 +0100 +++ gcc/builtins.cc 2025-01-22 09:42:42.764927141 +0100 @@ -9574,14 +9574,16 @@ fold_builtin_frexp (location_t loc, tree switch (value->cl) { case rvc_zero: + case rvc_nan: + case rvc_inf: /* For +-0, return (*exp = 0, +-0). */ + /* For +-NaN or +-Inf, *exp is unspecified, but something should + be stored there so that it isn't read from uninitialized object. + As glibc and newlib store *exp = 0 for +-Inf/NaN, storing + 0 here as well is easiest. */ exp = integer_zero_node; frac = arg0; break; - case rvc_nan: - case rvc_inf: - /* For +-NaN or +-Inf, *exp is unspecified, return arg0. */ - return omit_one_operand_loc (loc, rettype, arg0, arg1); case rvc_normal: { /* Since the frexp function always expects base 2, and in --- gcc/testsuite/gcc.dg/torture/builtin-frexp-1.c.jj 2020-01-12 11:54:37.546396315 +0100 +++ gcc/testsuite/gcc.dg/torture/builtin-frexp-1.c 2025-01-22 10:31:49.497851625 +0100 @@ -11,6 +11,7 @@ floating point formats need -funsafe-math-optimizations. */ /* { dg-require-effective-target inf } */ /* { dg-options "-funsafe-math-optimizations" { target powerpc*-*-* } } */ +/* { dg-additional-options "-Wmaybe-uninitialized" } */ extern void link_error(int); @@ -52,22 +53,36 @@ extern void link_error(int); link_error(__LINE__); \ } while (0) +int __attribute__ ((__noipa__)) +bar (int x) +{ + (void) x; + return 42; +} + /* Test that FUNCRES(frexp(NEG FUNCARG(ARGARG),&i)) is false. Check - the sign as well. Ensure side-effects are evaluated in i. */ + the sign as well. Ensure side-effects are evaluated in the second + frexp argument. */ #define TESTIT_FREXP2(NEG,FUNCARG,ARGARG,FUNCRES) do { \ - int i=5; \ + int i, j = 5; \ if (!__builtin_##FUNCRES##f(__builtin_frexpf(NEG __builtin_##FUNCARG##f(ARGARG),&i)) \ - || CKSGN_F(__builtin_frexpf(NEG __builtin_##FUNCARG##f(ARGARG),(i++,&i)), NEG __builtin_##FUNCARG##f(ARGARG)) \ - || CKEXP(i,6)) \ + || CKSGN_F(__builtin_frexpf(NEG __builtin_##FUNCARG##f(ARGARG),(j++,&i)), NEG __builtin_##FUNCARG##f(ARGARG)) \ + || CKEXP(j,6)) \ link_error(__LINE__); \ + if (CKEXP(bar(i),42)) \ +__builtin_abort(); \ if (!__builtin_##FUNCRES(__builtin_frexp(NEG __builtin_##FUNCARG(ARGARG),&i)) \ - || CKSGN(__builtin_frexp(NEG __builtin_##FUNCARG(ARGARG),(i++,&i)), NEG __builtin_##FUNCARG(ARGARG)) \ - || CKEXP(i,7)) \ + || CKSGN(__builtin_frexp(NEG __builtin_##FUNCARG(ARGARG),(j++,&i)), NEG __builtin_##FUNCARG(ARGARG)) \ + ||
Re: [PATCH v2 01/12] OpenMP/PolyInt: Pass poly-int structures by address to OMP libs.
On Wed, Jan 22, 2025 at 04:19:37PM +0530, Tejas Belagod wrote: > On 1/21/25 10:16 PM, Jakub Jelinek wrote: > > On Fri, Oct 18, 2024 at 11:52:22AM +0530, Tejas Belagod wrote: > > > Currently poly-int type structures are passed by value to OpenMP runtime > > > functions for shared clauses etc. This patch improves on this by passing > > > around poly-int structures by address to avoid copy-overhead. > > > > > > gcc/ChangeLog > > > * omp-low.c (use_pointer_for_field): Use pointer if the OMP data > > > structure's field type is a poly-int. > > > > I think I've acked this one earlier already. > > It is still ok. > > > > Thanks Jakub for the reviews. Just to clarify - this series is all now for > Stage 1, I'm guessing? Not necessarily, but likely. That said, I think the use_pointer_for_field patch can be committed separately now, doesn't have to wait for the rest. The general idea behind the tests would be to test something that users could be naturally using, so for worksharing constructs test something parallelizable, with multiple threads doing some work (of course, for the testcase purposes it doesn't need to be some really meaningful work, just something simple) and when testing the various data sharing or mapping of the variable length types, it should check that they are actually handled correctly (so for something shared see if multiple threads can read the shared variable (and otherwise perhaps do some slightly different work rather than the same in all threads), then perhaps in a parallel after a #pragma omp barrier write it in one of the threads, then after another #pragma omp barrier try to read it again in all the threads and verify each thread sees the one written earlier; for the privatization clauses verify that they are really private, let each thread write a different value to them concurrently and say after a barrier verify they read what they actually wrote, plus test the various extra properties of the privatization clauses, say for firstprivate test reading from the value before the parallelization that all threads read the expected value, for lastprivate that the value from the last iteration or section has been propagated back to the original item, etc.). In any cases the tests shouldn't have data races. Some tests e.g. for tasks would be nice too. Perhaps as the work for the different threads or different iterations of say omp for or omp loop or omp distribute you could be using just helper functions that take some SVE vectors as arguments (and perhaps the iteration number or thread number as another so that the work is different in each) and/or return them and e.g. for the privatization also check passing of SVE var address to a helper function and reading the value in there. Now, obviously somewhere in the gomp/libgomp testsuites we have tests that test the corner cases like behavior of a parallel with a single thread or worksharing constructs outside of an explicit parallel, but I think doing the same for SVE isn't really needed, or at least it should be tested in addition of tests actually testing something parallelized. Jakub
[patch,avr,applied] Add tests for LRA's PR118591
Added 2 tests for PR118591. Johann -- AVR: Add test cases for PR118591. gcc/testsuite/ PR rtl-optimization/118591 * gcc.target/avr/torture/pr118591-1.c: New test. * gcc.target/avr/torture/pr118591-2.c: New test. diff --git a/gcc/testsuite/gcc.target/avr/torture/pr118591-1.c b/gcc/testsuite/gcc.target/avr/torture/pr118591-1.c new file mode 100644 index 000..814f0410a7f --- /dev/null +++ b/gcc/testsuite/gcc.target/avr/torture/pr118591-1.c @@ -0,0 +1,22 @@ +/* { dg-do run { target { ! avr_tiny } } } */ +/* { dg-additional-options "-std=c99 -mlra" } */ + +__attribute__((noipa)) +void func2 (long long a1, long long a2, long b) +{ + static unsigned char count = 0; + if (b != count++) +__builtin_abort (); +} + +int main (void) +{ + for (long b = 0; b < 5; ++b) +{ + __asm ("" ::: "r5", "r9", "r24", "r20", "r16", "r12", "r30"); + + func2 (0, 0, b); +} + + return 0; +} diff --git a/gcc/testsuite/gcc.target/avr/torture/pr118591-2.c b/gcc/testsuite/gcc.target/avr/torture/pr118591-2.c new file mode 100644 index 000..83d36060088 --- /dev/null +++ b/gcc/testsuite/gcc.target/avr/torture/pr118591-2.c @@ -0,0 +1,21 @@ +/* Test case failed on avrtiny. */ +/* { dg-do run } */ +/* { dg-additional-options "-std=c99 -mlra" } */ + +__attribute__((noipa)) +void func2 (long a, long b) +{ + static unsigned char count = 0; + if (b != count++) +__builtin_abort (); +} + +int main (void) +{ + for (long b = 0; b < 5; ++b) +{ + func2 (0, b); +} + + return 0; +}
Re: [PATCH 3/3] aarch64: Avoid redundant writes to FPMR
Hi Richard, > On 22 Jan 2025, at 13:21, Richard Sandiford wrote: > > GCC 15 is the first release to support FP8 intrinsics. > The underlying instructions depend on the value of a new register, > FPMR. Unlike FPCR, FPMR is a normal call-clobbered/caller-save > register rather than a global register. So: > > - The FP8 intrinsics take a final uint64_t argument that > specifies what value FPMR should have. > > - If an FP8 operation is split across multiple functions, > it is likely that those functions would have a similar argument. > > If the object code has the structure: > >for (...) > fp8_kernel (..., fpmr_value); > > then fp8_kernel would set FPMR to fpmr_value each time it is > called, even though FPMR will already have that value for at > least the second and subsequent calls (and possibly the first). > > The working assumption for the ABI has been that writes to > registers like FPMR can in general be more expensive than > reads and so it would be better to use a conditional write like: > > mrs tmp, fpmr > cmp tmp, > beq 1f > nsr fpmr, Typo “msr” here and in the comment in the code. > 1: > > instead of writing the same value to FPMR repeatedly. > > This patch implements that. It also adds a tuning flag that suppresses > the behaviour, both to make testing easier and to support any future > cores that (for example) are able to rename FPMR. > > Hopefully this really is the last part of the FP8 enablement. > > Tested on aarch64-linux-gnu. I'll push in about 24 hours > if there are no comments before then. > > Richard > > > gcc/ > * config/aarch64/aarch64-tuning-flags.def > (AARCH64_EXTRA_TUNE_CHEAP_FPMR_WRITE): New tuning flag. > * config/aarch64/aarch64.h (TARGET_CHEAP_FPMR_WRITE): New macro. > * config/aarch64/aarch64.md: Split moves into FPMR into a test > and branch around. > (aarch64_write_fpmr): New pattern. > > gcc/testsuite/ > * g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Add > cheap_fpmr_write by default. > * gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Likewise. > * gcc.target/aarch64/acle/fp8.c: Add cheap_fpmr_write. > * gcc.target/aarch64/acle/fpmr-2.c: Likewise. > * gcc.target/aarch64/simd/vcvt_fpm.c: Likewise. > * gcc.target/aarch64/simd/vdot2_fpm.c: Likewise. > * gcc.target/aarch64/simd/vdot4_fpm.c: Likewise. > * gcc.target/aarch64/simd/vmla_fpm.c: Likewise. > * gcc.target/aarch64/acle/fpmr-6.c: New test. > --- > gcc/config/aarch64/aarch64-tuning-flags.def | 15 +++ > gcc/config/aarch64/aarch64.h | 5 +++ > gcc/config/aarch64/aarch64.md | 39 +++ > .../sve2/acle/aarch64-sve2-acle-asm.exp | 2 +- > gcc/testsuite/gcc.target/aarch64/acle/fp8.c | 2 +- > .../gcc.target/aarch64/acle/fpmr-2.c | 2 +- > .../gcc.target/aarch64/acle/fpmr-6.c | 36 + > .../gcc.target/aarch64/simd/vcvt_fpm.c| 2 +- > .../gcc.target/aarch64/simd/vdot2_fpm.c | 2 +- > .../gcc.target/aarch64/simd/vdot4_fpm.c | 2 +- > .../gcc.target/aarch64/simd/vmla_fpm.c| 2 +- > .../sve2/acle/aarch64-sve2-acle-asm.exp | 2 +- > 12 files changed, 103 insertions(+), 8 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fpmr-6.c > > diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def > b/gcc/config/aarch64/aarch64-tuning-flags.def > index 60967aac903..7a67d6197d9 100644 > --- a/gcc/config/aarch64/aarch64-tuning-flags.def > +++ b/gcc/config/aarch64/aarch64-tuning-flags.def > @@ -48,6 +48,21 @@ AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", > FULLY_PIPELINED_FMA) >rather than re-use an input predicate register. */ > AARCH64_EXTRA_TUNING_OPTION ("avoid_pred_rmw", AVOID_PRED_RMW) > > +/* Whether writes to the FPMR are cheap enough that: > + > + msr fpmr, > + > + is better than: > + > + mrs tmp, fpmr > + cmp tmp, > + beq 1f > + nsr fpmr, > + 1: > + > + even when the branch is predictably taken. */ > +AARCH64_EXTRA_TUNING_OPTION ("cheap_fpmr_write", CHEAP_FPMR_WRITE) > + > /* Baseline tuning settings suitable for all modern cores. */ > #define AARCH64_EXTRA_TUNE_BASE (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND \ > | AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA) > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h > index 218868a5246..5cbf442130b 100644 > --- a/gcc/config/aarch64/aarch64.h > +++ b/gcc/config/aarch64/aarch64.h > @@ -486,6 +486,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE > ATTRIBUTE_UNUSED > /* fp8 instructions are enabled through +fp8. */ > #define TARGET_FP8 AARCH64_HAVE_ISA (FP8) > > +/* See the comment above the tuning flag for details. */ > +#define TARGET_CHEAP_FPMR_WRITE \ > + (bool (aarch64_tune_params.extra_tuning_flags \ > + & AARCH64_EXTRA_TUNE_CHEAP_FPMR_WRITE)) > + > /* Combinatorial tests. */ > > #define TARGET_SVE2_OR_SME2 \ > diff --git a/gcc/config/aarch64/aarch6
Re: [PATCH 3/3] aarch64: Avoid redundant writes to FPMR
Kyrylo Tkachov writes: > Hi Richard, > >> On 22 Jan 2025, at 13:21, Richard Sandiford >> wrote: >> >> GCC 15 is the first release to support FP8 intrinsics. >> The underlying instructions depend on the value of a new register, >> FPMR. Unlike FPCR, FPMR is a normal call-clobbered/caller-save >> register rather than a global register. So: >> >> - The FP8 intrinsics take a final uint64_t argument that >> specifies what value FPMR should have. >> >> - If an FP8 operation is split across multiple functions, >> it is likely that those functions would have a similar argument. >> >> If the object code has the structure: >> >>for (...) >> fp8_kernel (..., fpmr_value); >> >> then fp8_kernel would set FPMR to fpmr_value each time it is >> called, even though FPMR will already have that value for at >> least the second and subsequent calls (and possibly the first). >> >> The working assumption for the ABI has been that writes to >> registers like FPMR can in general be more expensive than >> reads and so it would be better to use a conditional write like: >> >> mrs tmp, fpmr >> cmp tmp, >> beq 1f >> nsr fpmr, > > Typo “msr” here and in the comment in the code. Oops, thanks, will fix. > [...] >> @@ -1883,6 +1884,44 @@ (define_split >> } >> ) >> >> +;; The preferred way of writing to the FPMR is to test whether it already >> +;; has the desired value and branch around the write if so. This reduces >> +;; the number of redundant FPMR writes caused by ABI boundaries, such as in: >> +;; >> +;;for (...) >> +;; fp8_kernel (..., fpmr_value); >> +;; >> +;; Without this optimization, fp8_kernel would set FPMR to fpmr_value each >> +;; time that it is called. >> +;; >> +;; We do this as a split so that hardreg_pre can optimize the moves first. >> +(define_split >> + [(set (reg:DI FPM_REGNUM) >> +(match_operand:DI 0 "aarch64_reg_or_zero"))] >> + "TARGET_FP8 && !TARGET_CHEAP_FPMR_WRITE && can_create_pseudo_p ()" >> + [(const_int 0)] >> + { >> +auto label = gen_label_rtx (); >> +rtx current = copy_to_reg (gen_rtx_REG (DImode, FPM_REGNUM)); >> +rtx cond = gen_rtx_EQ (VOIDmode, current, operands[0]); >> +emit_jump_insn (gen_cbranchdi4 (cond, current, operands[0], label)); > > Do you think it’s worth marking this jump as likely? > In some other expand code in the backend where we emit jumps we sometimes use > aarch64_emit_unlikely_jump. Ah, yeah, I should have said that I'd wondered about that. But in the end it didn't seem appropriate. Given that hardreg_pre should remove local instances of redundancy, we don't really have much information about whether the branch is likely or unlikely. I think instead the hope/expectation is that the branch has a predictable pattern. Thanks, Richard
[PUSHED] s390: Fix arch15 machine string for binutils
gcc/ChangeLog: * config/s390/s390.cc: Fix arch15 machine string which must not be empty. --- gcc/config/s390/s390.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc index 313f968c87e..86a5f059b85 100644 --- a/gcc/config/s390/s390.cc +++ b/gcc/config/s390/s390.cc @@ -342,7 +342,7 @@ const struct s390_processor processor_table[] = { "z14","arch12", PROCESSOR_3906_Z14,&zEC12_cost, 12 }, { "z15","arch13", PROCESSOR_8561_Z15,&zEC12_cost, 13 }, { "z16","arch14", PROCESSOR_3931_Z16,&zEC12_cost, 14 }, - { "arch15", "", PROCESSOR_ARCH15, &zEC12_cost, 15 }, + { "arch15", "arch15", PROCESSOR_ARCH15, &zEC12_cost, 15 }, { "native", "", PROCESSOR_NATIVE, NULL, 0 } }; -- 2.47.0
Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures
Hello, On Wed, 22 Jan 2025, Martin Uecker wrote: > > > So you do not need to look further. But maybe I am missing something > > > else. > > > > Like ... > > > > > > Note further that you may have '{ .y[1][3].z }', which is still not a > > > > designation, but an expression under your proposal, whereas > > > > '{ .y[1][3].z = 1 }' would remain a designation. This shows that you > > > > now need arbitrary look-ahead to disambiguate the two. A Very Bad Idea. > > > > ... this? > > In .y[1][3].z after .y you can decide whether y is a member of the > struct being initialized. If it is, it is a designator and if not > it must be an expression. If y is not a member it must be an expression, true. But if it's a member you don't know, it may be a designation or an expression. Ciao, Michael.
[PATCH 1/3] aarch64: Allow FPMR source values to be zero
GCC 15 is going to be the first release to support FPMR. The alternatives for moving values into FPMR were missing a zero alternative, meaning that moves of zero would use an unnecessary temporary register. Tested on aarch64-linux-gnu. I'll push in about 24 hours if there are no comments before then. Richard gcc/ * config/aarch64/aarch64.md (*mov_aarch64) (*movsi_aarch64, *movdi_aarch64): Allow the source of an MSR to be zero. gcc/testsuite/ * gcc.target/aarch64/acle/fp8.c: Add tests for moving zero into FPMR. --- gcc/config/aarch64/aarch64.md | 50 ++--- gcc/testsuite/gcc.target/aarch64/acle/fp8.c | 47 +++ 2 files changed, 72 insertions(+), 25 deletions(-) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 39655ea5e39..776c4c4ceee 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1531,7 +1531,7 @@ (define_insn "*mov_aarch64" [w, r Z ; neon_from_gp, nosimd ] fmov\t%s0, %w1 [w, w; neon_dup , simd ] dup\t%0, %1.[0] [w, w; neon_dup , nosimd ] fmov\t%s0, %s1 - [Umv, r ; mrs, * ] msr\t%0, %x1 + [Umv, rZ ; mrs, * ] msr\t%0, %x1 [r, Umv ; mrs, * ] mrs\t%x0, %1 } ) @@ -1595,7 +1595,7 @@ (define_insn_and_split "*movsi_aarch64" [r , w ; f_mrc, fp , 4] fmov\t%w0, %s1 [w , w ; fmov , fp , 4] fmov\t%s0, %s1 [w , Ds ; neon_move, simd, 4] << aarch64_output_scalar_simd_mov_immediate (operands[1], SImode); - [Umv, r ; mrs , * , 4] msr\t%0, %x1 + [Umv, rZ ; mrs , * , 4] msr\t%0, %x1 [r, Umv ; mrs , * , 4] mrs\t%x0, %1 } "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), SImode) @@ -1613,30 +1613,30 @@ (define_insn_and_split "*movdi_aarch64" "(register_operand (operands[0], DImode) || aarch64_reg_or_zero (operands[1], DImode))" {@ [cons: =0, 1; attrs: type, arch, length] - [w, Z ; neon_move, simd, 4] movi\t%0.2d, #0 - [r, r ; mov_reg , * , 4] mov\t%x0, %x1 - [k, r ; mov_reg , * , 4] mov\t%0, %x1 - [r, k ; mov_reg , * , 4] mov\t%x0, %1 - [r, O ; mov_imm , * , 4] << aarch64_is_mov_xn_imm (INTVAL (operands[1])) ? "mov\t%x0, %1" : "mov\t%w0, %1"; - [r, n ; mov_imm , * ,16] # + [w, Z ; neon_move, simd, 4] movi\t%0.2d, #0 + [r, r ; mov_reg , * , 4] mov\t%x0, %x1 + [k, r ; mov_reg , * , 4] mov\t%0, %x1 + [r, k ; mov_reg , * , 4] mov\t%x0, %1 + [r, O ; mov_imm , * , 4] << aarch64_is_mov_xn_imm (INTVAL (operands[1])) ? "mov\t%x0, %1" : "mov\t%w0, %1"; + [r, n ; mov_imm , * ,16] # /* The "mov_imm" type for CNT is just a placeholder. */ - [r, Usv; mov_imm , sve , 4] << aarch64_output_sve_cnt_immediate ("cnt", "%x0", operands[1]); - [r, Usr; mov_imm , sve, 4] << aarch64_output_sve_rdvl (operands[1]); - [r, UsR; mov_imm , sme, 4] << aarch64_output_rdsvl (operands[1]); - [r, m ; load_8 , * , 4] ldr\t%x0, %1 - [w, m ; load_8 , fp , 4] ldr\t%d0, %1 - [m, r Z; store_8 , * , 4] str\t%x1, %0 - [m, w ; store_8 , fp , 4] str\t%d1, %0 - [r, Usw; load_8 , * , 8] << TARGET_ILP32 ? "adrp\t%0, %A1\;ldr\t%w0, [%0, %L1]" : "adrp\t%0, %A1\;ldr\t%0, [%0, %L1]"; - [r, Usa; adr , * , 4] adr\t%x0, %c1 - [r, Ush; adr , * , 4] adrp\t%x0, %A1 - [w, r Z; f_mcr, fp , 4] fmov\t%d0, %x1 - [r, w ; f_mrc, fp , 4] fmov\t%x0, %d1 - [w, w ; fmov , fp , 4] fmov\t%d0, %d1 - [w, Dd ; neon_move, simd, 4] << aarch64_output_scalar_simd_mov_immediate (operands[1], DImode); - [w, Dx ; neon_move, simd, 8] # - [Umv, r; mrs , * , 4] msr\t%0, %1 - [r, Umv; mrs , * , 4] mrs\t%0, %1 + [r, Usv ; mov_imm , sve , 4] << aarch64_output_sve_cnt_immediate ("cnt", "%x0", operands[1]); + [r, Usr ; mov_imm , sve, 4] << aarch64_output_sve_rdvl (operands[1]); + [r, UsR ; mov_imm , sme, 4] << aarch64_output_rdsvl (operands[1]); + [r, m ; load_8 , * , 4] ldr\t%x0, %1 + [w, m ; load_8 , fp , 4] ldr\t%d0, %1 + [m, r Z ; store_8 , * , 4] str\t%x1, %0 + [m, w ; store_8 , fp , 4] str\t%d1, %0 + [r, Usw ; load_8 , * , 8] << TARGET_ILP32 ? "adrp\t%0, %A1\;ldr\t%w0, [%0, %L1]" : "adrp\t%0, %A1\;ldr\t%0, [%0, %L1]"; + [r, Usa ; adr , * , 4] adr\t%x0, %c1 + [r, Ush ; adr , * , 4] adrp\t%x0, %A1 + [w, r Z ; f_mcr, fp , 4] fmov\t%d0, %x1 + [r, w ; f_mrc, fp , 4] fmov\t%x0, %d1 + [w, w ; fmov , fp , 4] fmov\t%d0, %d1 + [w, Dd ; neon_move, simd, 4] << aarch64_output_scalar_simd_mov_immediate (operands[1], DImode); + [w, Dx ; neon_move, simd, 8] # + [Umv, rZ; mrs , * , 4] msr\t%0, %x1 + [r, Umv ; mrs , * , 4] mrs\t%0, %1 } "CONST_INT
[PATCH 2/3] aarch64: Fix memory cost for FPM_REGNUM
GCC 15 is going to be the first release to support FPMR. While working on a follow-up patch, I noticed that for: (set (reg:DI R) ...) ... (set (reg:DI fpmr) (reg:DI R)) IRA would prefer to spill R to memory rather than allocate a GPR. This is because the register move cost for GENERAL_REGS to MOVEABLE_SYSREGS is very high: /* Moves to/from sysregs are expensive, and must go via GPR. */ if (from == MOVEABLE_SYSREGS) return 80 + aarch64_register_move_cost (mode, GENERAL_REGS, to); if (to == MOVEABLE_SYSREGS) return 80 + aarch64_register_move_cost (mode, from, GENERAL_REGS); but the memory cost for MOVEABLE_SYSREGS was the same as for GENERAL_REGS, making memory much cheaper. Loading and storing FPMR involves a GPR temporary, so the cost should account for moving into and out of that temporary. This did show up indirectly in some of the existing asm tests, where the stack frame allocated 16 bytes for callee saves (D8) and another 16 bytes for spilling a temporary register. It's possible that other registers need the same treatment and it's more than probable that this code needs a rework. None of that seems suitable for stage 4 though. Tested on aarch64-linux-gnu. I'll push in about 24 hours if there are no comments before then. Richard gcc/ * config/aarch64/aarch64.cc (aarch64_memory_move_cost): Account for the cost of moving in and out of GENERAL_SYSREGS. gcc/testsuite/ * gcc.target/aarch64/acle/fpmr-5.c: New test. * gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c: Don't expect a spill slot to be allocated. * gcc.target/aarch64/sve2/acle/asm/mlalb_lane_mf8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/mlallbb_lane_mf8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/mlallbt_lane_mf8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/mlalltb_lane_mf8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/mlalltt_lane_mf8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/mlalt_lane_mf8.c: Likewise. --- gcc/config/aarch64/aarch64.cc| 11 +-- gcc/testsuite/gcc.target/aarch64/acle/fpmr-5.c | 16 .../aarch64/sve2/acle/asm/dot_lane_mf8.c | 4 ++-- .../aarch64/sve2/acle/asm/mlalb_lane_mf8.c | 2 +- .../aarch64/sve2/acle/asm/mlallbb_lane_mf8.c | 2 +- .../aarch64/sve2/acle/asm/mlallbt_lane_mf8.c | 2 +- .../aarch64/sve2/acle/asm/mlalltb_lane_mf8.c | 2 +- .../aarch64/sve2/acle/asm/mlalltt_lane_mf8.c | 2 +- .../aarch64/sve2/acle/asm/mlalt_lane_mf8.c | 2 +- 9 files changed, 33 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fpmr-5.c diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index dba779a8e51..a1f5619a615 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -15858,9 +15858,16 @@ aarch64_memory_move_cost (machine_mode mode, reg_class_t rclass_i, bool in) ? aarch64_tune_params.memmov_cost.load_fp : aarch64_tune_params.memmov_cost.store_fp); + /* If the move needs to go through GPRs, add the cost of doing that. */ + int base = 0; + if (rclass_i == MOVEABLE_SYSREGS) +base += (in +? aarch64_register_move_cost (DImode, GENERAL_REGS, rclass_i) +: aarch64_register_move_cost (DImode, rclass_i, GENERAL_REGS)); + return (in - ? aarch64_tune_params.memmov_cost.load_int - : aarch64_tune_params.memmov_cost.store_int); + ? base + aarch64_tune_params.memmov_cost.load_int + : base + aarch64_tune_params.memmov_cost.store_int); } /* Implement TARGET_INSN_COST. We have the opportunity to do something diff --git a/gcc/testsuite/gcc.target/aarch64/acle/fpmr-5.c b/gcc/testsuite/gcc.target/aarch64/acle/fpmr-5.c new file mode 100644 index 000..da6d7f62f90 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/acle/fpmr-5.c @@ -0,0 +1,16 @@ +/* { dg-options "-O" } */ + +#include + +void f(int cond) +{ + uint64_t x; + asm volatile ("" : "=r" (x)); + if (cond) +{ + register uint64_t fpmr asm ("fpmr") = x; + asm volatile ("" :: "Umv" (fpmr)); +} +} + +/* { dg-final { scan-assembler-not {\tsub\tsp,} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c index 9e54cd11c4b..83fe5cff5d3 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c @@ -70,7 +70,7 @@ TEST_DUAL_Z (dot_lane_1_f16, svfloat16_t, svmfloat8_t, ** msr fpmr, x0 ** mov (z[0-7])\.d, z8\.d ** fdotz0\.h, z1\.b, \1\.b\[1\] -** ldr d8, \[sp\], 32 +** ldr d8, \[sp\], 16 ** ret */ TEST_DUAL_LANE_REG (dot_lane_z8_f16, svfloat16_t, svmfloat8_t, z8, @@ -151,7 +151,7 @@ TEST_DUAL_Z (dot_lane_1_f32, svfloat32_t, svmfloat8_t, ** msr
[PATCH 3/3] aarch64: Avoid redundant writes to FPMR
GCC 15 is the first release to support FP8 intrinsics. The underlying instructions depend on the value of a new register, FPMR. Unlike FPCR, FPMR is a normal call-clobbered/caller-save register rather than a global register. So: - The FP8 intrinsics take a final uint64_t argument that specifies what value FPMR should have. - If an FP8 operation is split across multiple functions, it is likely that those functions would have a similar argument. If the object code has the structure: for (...) fp8_kernel (..., fpmr_value); then fp8_kernel would set FPMR to fpmr_value each time it is called, even though FPMR will already have that value for at least the second and subsequent calls (and possibly the first). The working assumption for the ABI has been that writes to registers like FPMR can in general be more expensive than reads and so it would be better to use a conditional write like: mrs tmp, fpmr cmp tmp, beq 1f nsr fpmr, 1: instead of writing the same value to FPMR repeatedly. This patch implements that. It also adds a tuning flag that suppresses the behaviour, both to make testing easier and to support any future cores that (for example) are able to rename FPMR. Hopefully this really is the last part of the FP8 enablement. Tested on aarch64-linux-gnu. I'll push in about 24 hours if there are no comments before then. Richard gcc/ * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_CHEAP_FPMR_WRITE): New tuning flag. * config/aarch64/aarch64.h (TARGET_CHEAP_FPMR_WRITE): New macro. * config/aarch64/aarch64.md: Split moves into FPMR into a test and branch around. (aarch64_write_fpmr): New pattern. gcc/testsuite/ * g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Add cheap_fpmr_write by default. * gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Likewise. * gcc.target/aarch64/acle/fp8.c: Add cheap_fpmr_write. * gcc.target/aarch64/acle/fpmr-2.c: Likewise. * gcc.target/aarch64/simd/vcvt_fpm.c: Likewise. * gcc.target/aarch64/simd/vdot2_fpm.c: Likewise. * gcc.target/aarch64/simd/vdot4_fpm.c: Likewise. * gcc.target/aarch64/simd/vmla_fpm.c: Likewise. * gcc.target/aarch64/acle/fpmr-6.c: New test. --- gcc/config/aarch64/aarch64-tuning-flags.def | 15 +++ gcc/config/aarch64/aarch64.h | 5 +++ gcc/config/aarch64/aarch64.md | 39 +++ .../sve2/acle/aarch64-sve2-acle-asm.exp | 2 +- gcc/testsuite/gcc.target/aarch64/acle/fp8.c | 2 +- .../gcc.target/aarch64/acle/fpmr-2.c | 2 +- .../gcc.target/aarch64/acle/fpmr-6.c | 36 + .../gcc.target/aarch64/simd/vcvt_fpm.c| 2 +- .../gcc.target/aarch64/simd/vdot2_fpm.c | 2 +- .../gcc.target/aarch64/simd/vdot4_fpm.c | 2 +- .../gcc.target/aarch64/simd/vmla_fpm.c| 2 +- .../sve2/acle/aarch64-sve2-acle-asm.exp | 2 +- 12 files changed, 103 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fpmr-6.c diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index 60967aac903..7a67d6197d9 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -48,6 +48,21 @@ AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", FULLY_PIPELINED_FMA) rather than re-use an input predicate register. */ AARCH64_EXTRA_TUNING_OPTION ("avoid_pred_rmw", AVOID_PRED_RMW) +/* Whether writes to the FPMR are cheap enough that: + + msr fpmr, + + is better than: + + mrs tmp, fpmr + cmp tmp, + beq 1f + nsr fpmr, + 1: + + even when the branch is predictably taken. */ +AARCH64_EXTRA_TUNING_OPTION ("cheap_fpmr_write", CHEAP_FPMR_WRITE) + /* Baseline tuning settings suitable for all modern cores. */ #define AARCH64_EXTRA_TUNE_BASE (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND \ | AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA) diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 218868a5246..5cbf442130b 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -486,6 +486,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED /* fp8 instructions are enabled through +fp8. */ #define TARGET_FP8 AARCH64_HAVE_ISA (FP8) +/* See the comment above the tuning flag for details. */ +#define TARGET_CHEAP_FPMR_WRITE \ + (bool (aarch64_tune_params.extra_tuning_flags \ +& AARCH64_EXTRA_TUNE_CHEAP_FPMR_WRITE)) + /* Combinatorial tests. */ #define TARGET_SVE2_OR_SME2 \ diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 776c4c4ceee..071058dbeb3 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -356,6 +356,7 @@ (d
[pushed] aarch64: Fix aarch64_write_sysregdi predicate
While working on another MSR-related patch, I noticed that aarch64_write_sysregdi's constraints allowed zero, but its predicate didn't. This could in principle lead to an ICE during or after RA, since "Z" allows the RA to rematerialise a known zero directly into the instruction. The usual techniques for exposing a bug like that didn't work in this case, since the optimisers seem to make no attempt to remove redundant zero moves (at least not for these unspec_volatiles). But the problem still seems worth fixing pre-emptively. Tested on aarch64-linux-gnu & pushed. Richard gcc/ * config/aarch64/aarch64.md (aarch64_read_sysregti): Change the source predicate to aarch64_reg_or_zero. gcc/testsuite/ * gcc.target/aarch64/acle/rwsr-4.c: New test. * gcc.target/aarch64/acle/rwsr-armv8p9.c: Avoid read of uninitialized variable. --- gcc/config/aarch64/aarch64.md | 2 +- gcc/testsuite/gcc.target/aarch64/acle/rwsr-4.c| 15 +++ .../gcc.target/aarch64/acle/rwsr-armv8p9.c| 4 +--- 3 files changed, 17 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-4.c diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index f8d82cee903..39655ea5e39 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -659,7 +659,7 @@ (define_insn "aarch64_read_sysregti" (define_insn "aarch64_write_sysregdi" [(unspec_volatile:DI [(match_operand 0 "aarch64_sysreg_string" "") - (match_operand:DI 1 "register_operand" "rZ")] + (match_operand:DI 1 "aarch64_reg_or_zero" "rZ")] UNSPEC_SYSREG_WDI)] "" "msr\t%0, %x1" diff --git a/gcc/testsuite/gcc.target/aarch64/acle/rwsr-4.c b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-4.c new file mode 100644 index 000..52fb603f3cf --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-4.c @@ -0,0 +1,15 @@ +/* { dg-options "-O" } */ + +#include +#include + +void f(); +void g() +{ + int64_t x = 0; + __arm_wsr64("tpidr_el0", x); + f(); + __arm_wsr64("tpidr_el0", x); +} + +/* { dg-final { scan-assembler-times {\tmsr\t[^,]+, xzr\n} 2 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/acle/rwsr-armv8p9.c b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-armv8p9.c index e2f297bbeeb..c49fbb5368e 100644 --- a/gcc/testsuite/gcc.target/aarch64/acle/rwsr-armv8p9.c +++ b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-armv8p9.c @@ -3,10 +3,8 @@ /* { dg-options "-O2 -march=armv8.9-a" } */ #include void -readwrite_armv8p9a_sysregs () +readwrite_armv8p9a_sysregs (long long int a) { - long long int a; - /* Write-only system registers. */ __arm_wsr64 ("pmzr_el0", a); /* { dg-final { scan-assembler "msr\ts3_3_c9_c13_4, x0" } } */ -- 2.25.1
Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures
Hello, On Wed, 22 Jan 2025, Martin Uecker wrote: > > You need to decide which is which after seeing the ".". I'm guessing what > > you mean is that on seeing ".ident" as first two tokens inside in > > initializer-list you go the designator route, and not the > > initializer/assignment-expression route, even though the latter can now > > also start with ".ident". > > What I mean is that after parsing the dot followed by an identifier x, > if x is the name of a member of the structure S which is being initialized, > it is a designator, otherwise it is an expression that uses .x to refer > to some member of an enclosing definition. So, as I guessed. > So you do not need to look further. But maybe I am missing something > else. Like ... > > Note further that you may have '{ .y[1][3].z }', which is still not a > > designation, but an expression under your proposal, whereas > > '{ .y[1][3].z = 1 }' would remain a designation. This shows that you > > now need arbitrary look-ahead to disambiguate the two. A Very Bad Idea. ... this? Ciao, Michael.
Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures
Am Mittwoch, dem 22.01.2025 um 16:25 +0100 schrieb Michael Matz: > Hello, > > On Wed, 22 Jan 2025, Martin Uecker wrote: > > > > You need to decide which is which after seeing the ".". I'm guessing > > > what > > > you mean is that on seeing ".ident" as first two tokens inside in > > > initializer-list you go the designator route, and not the > > > initializer/assignment-expression route, even though the latter can now > > > also start with ".ident". > > > > What I mean is that after parsing the dot followed by an identifier x, > > if x is the name of a member of the structure S which is being initialized, > > it is a designator, otherwise it is an expression that uses .x to refer > > to some member of an enclosing definition. > > So, as I guessed. I think you missed the part that you can lookup whether x is a member of the struct S. > > > So you do not need to look further. But maybe I am missing something > > else. > > Like ... > > > > Note further that you may have '{ .y[1][3].z }', which is still not a > > > designation, but an expression under your proposal, whereas > > > '{ .y[1][3].z = 1 }' would remain a designation. This shows that you > > > now need arbitrary look-ahead to disambiguate the two. A Very Bad Idea. > > ... this? In .y[1][3].z after .y you can decide whether y is a member of the struct being initialized. If it is, it is a designator and if not it must be an expression. Martin
Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures
Am Mittwoch, dem 22.01.2025 um 15:53 +0100 schrieb Michael Matz: > Hello, > > On Tue, 21 Jan 2025, Martin Uecker wrote: > > > > > Coudn't you use the rule that .len refers to the closest enclosing > > > > structure > > > > even without __self__ ? This would then also disambiguate between > > > > designators > > > > and other uses. > > > > > > Right now, an expression cannot start with '.', which provides the > > > disambiguation between designators and expressions as initializers. > > > > You could disambiguate directly after parsing the identifier, which > > does not seem overly problematic. > > Which way? When you allow ". identifier" as primary expression, then in > > struct S s = { .x = 42 }; > > the initializer can be parsed as designated initializer (with error > when 'x' is not a member of S) or as assignment expression like in > > struct T t = { foo = 42 }; > > You need to decide which is which after seeing the ".". I'm guessing what > you mean is that on seeing ".ident" as first two tokens inside in > initializer-list you go the designator route, and not the > initializer/assignment-expression route, even though the latter can now > also start with ".ident". What I mean is that after parsing the dot followed by an identifier x, if x is the name of a member of the structure S which is being initialized, it is a designator, otherwise it is an expression that uses .x to refer to some member of an enclosing definition. So you do not need to look further. But maybe I am missing something else. Martin > But then, what about: > > struct U u = { .y }; > > It's certainly not a designation anymore, but you only know after not > seeing the '='. So here it's actually an assignment-expression. And as > you allowed ".ident" as primary expression this might rightfully refer to > some in-scope 'y' member of some outer struct (or give an error). > > Note further that you may have '{ .y[1][3].z }', which is still not a > designation, but an expression under your proposal, whereas > '{ .y[1][3].z = 1 }' would remain a designation. This shows that you > now need arbitrary look-ahead to disambiguate the two. A Very Bad Idea. > > > Ciao, > Michael. -- Univ.-Prof. Dr. rer. nat. Martin Uecker Graz University of Technology Institute of Biomedical Imaging
[PATCH v4] arm: [MVE intrinsics] Avoid warnings when floating-point is not supported [PR 117814]
If the target does not support floating-point, we register FP vector types as 'void' (see register_vector_type). The leads to warnings about 'pure attribute on function returning void' when we declare the various load intrinsics because their call_properties say CP_READ_MEMORY (thus giving them the 'pure' attribute), but their return type is void. This happens for instance in gcc.target/arm/pr112337.c, depending on how GCC is built (I didn't notice the warnings because arm_mve.h is considered as a system include in my environment, and the warning is not emitted, but CI reported it). To avoid such warnings, declare floating-point scalar and vector types even if the target does not have an FPU. Note that since an FPU can be activated via #pragma GCC target ("arch=armv8.1-m.main+mve.fp" for instance), it means that such types cannot appear and disappear withing a single TU, they have to be available in all contexts. This implies a noteworthy change for __fp16: it not longer depends on using -mfp16-format=ieee or alternative. Also note that if the target ISA has the fp16 bit set, we already silently activate -mfp16-format=ieee (with an error if -mfp16-format=alternative was supplied). The patch now enforces -mfp16-format=none if the option was used. In arm-mve-builtins.cc (register_builtin_types, register_vector_type, register_builtin_tuple_types), this means simply removing the early exits. However, for this to work, we need to update arm_vector_mode_supported_p, so that vector floating-point types are always defined, and __fp16 must always be registered by arm_init_fp16_builtins (as it is the base type for vectors of float16_t. Another side effect is that the declaration of float16_t and float32_t typedefs is now unconditional. The new tests verify that: - we emit an error if the code tries to use floating-point intrinsics and the target does not have the floating-point extension - we emit the expected code when activating the floating-point expected via a pragma - we emit the expected code when the target supports floating-point (no pragma needed) - we apply -mfp16-format=none where we used to default to ieee An update is needed in g++.target/arm/mve/general-c++/nomve_fp_1.c, because the error message now correctly uses float16x8_t instead of void as return type. The patch removes gcc.target/arm/fp16-compile-none-1.c which tests that using __fp16 produces an error with -mfp16-format=none, since it is no longer the case. gcc/ChangeLog: PR target/117814 * config/arm/arm-builtins.cc (arm_init_fp16_builtins): Always register __fp16 type. * config/arm/arm-mve-builtins.cc (register_builtin_tuple_types): Remove special handling when TARGET_HAVE_MVE_FLOAT is false. (register_vector_type): Likewise. (register_builtin_tuple_types): Likewise. * config/arm/arm-opts.h (arm_fp16_format_type): Add ARM_FP16_FORMAT_DEFAULT. * config/arm/arm.cc (arm_vector_mode_supported_p): Accept floating-point vector modes even if TARGET_HAVE_MVE_FLOAT is false. (arm_option_reconfigure_globals): Apply ARM_FP16_FORMAT_NONE if requested. * config/arm/arm.opt (mfp16-format): Default to ARM_FP16_FORMAT_DEFAULT. * config/arm/arm_mve_types.h (float16_t, float32_t): Define unconditionally. * doc/extend.texi (Half-precision Floating-point): __fp16 is now always available on arm. More x86 paragraph closer to the rest of the x86 information. * doc/sourcebuild.texi (ARM-specific attributes): Document arm_v8_1m_mve_nofp_ok. gcc/testsuite/ChangeLog: PR target/117814 * gcc.target/arm/mve/intrinsics/pr117814-f16.c: New test. * gcc.target/arm/mve/intrinsics/pr117814-2-f16.c: New test. * gcc.target/arm/mve/intrinsics/pr117814-3-f16.c: New test. * gcc.target/arm/mve/intrinsics/pr117814-4-f16.c: New test. * gcc.target/arm/mve/intrinsics/pr117814-f32.c: New test. * gcc.target/arm/mve/intrinsics/pr117814-2-f32.c: New test. * gcc.target/arm/mve/intrinsics/pr117814-3-f32.c: New test. * gcc.target/arm/fp16-compile-none-1.c: Delete. * g++.target/arm/mve/general-c++/nomve_fp_1.c: Fix expected error message. * lib/target-supports.exp (check_effective_target_arm_v8_1m_mve_nofp_ok_nocache): New. (check_effective_target_arm_v8_1m_mve_nofp_ok): New. (add_options_for_arm_v8_1m_mve_nofp): New. --- gcc/config/arm/arm-builtins.cc| 4 +- gcc/config/arm/arm-mve-builtins.cc| 22 + gcc/config/arm/arm-opts.h | 1 + gcc/config/arm/arm.cc | 14 +++--- gcc/config/arm/arm.opt| 2 +- gcc/config/arm/arm_mve_types.h| 2 - gcc/doc/extend.texi | 29 ++- gcc/doc/sourcebuild.texi | 6 +++
Re: [PATCH 1/2] LoongArch: Fix wrong code with _alsl_reversesi_extended
在 2025/1/22 下午5:21, Xi Ruoyao 写道: On Wed, 2025-01-22 at 10:53 +0800, Xi Ruoyao wrote: On Wed, 2025-01-22 at 10:37 +0800, Lulu Cheng wrote: 在 2025/1/22 上午8:49, Xi Ruoyao 写道: The second source register of this insn cannot be the same as the destination register. gcc/ChangeLog: * config/loongarch/loongarch.md (_alsl_reversesi_extended): Add '&' to the destination register constraint and append '0' to the first source register constraint to indicate the destination register cannot be same as the second source register, and change the split condition to reload_completed so that the insn will be split only after RA in order to obtain allocated registers that satisfy the above constraints. gcc/testsuite/ChangeLog: * gcc.target/loongarch/bitwise-shift-reassoc-clobber.c: New test. --- gcc/config/loongarch/loongarch.md | 6 +++--- .../loongarch/bitwise-shift-reassoc-clobber.c | 21 +++ 2 files changed, 24 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/bitwise-shift-reassoc-clobber.c diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 223e2b9f37f..1392325038c 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -3160,13 +3160,13 @@ (define_insn_and_split "_shift_reverse" ;; add.w => alsl.w, so implement slli.d + and + add.w => and + alsl.w on ;; our own. (define_insn_and_split "_alsl_reversesi_extended" - [(set (match_operand:DI 0 "register_operand" "=r") + [(set (match_operand:DI 0 "register_operand" "=&r") (sign_extend:DI (plus:SI (subreg:SI (any_bitwise:DI (ashift:DI - (match_operand:DI 1 "register_operand" "r") + (match_operand:DI 1 "register_operand" "r0") (match_operand:SI 2 "const_immalsl_operand" "")) (match_operand:DI 3 "const_int_operand" "i")) 0) @@ -3175,7 +3175,7 @@ (define_insn_and_split "_alsl_reversesi_extended" && loongarch_reassoc_shift_bitwise (, operands[2], operands[3], SImode)" "#" - "&& true" + "&& reload_completed" I have no problem with this patch. But, I have always been confused about the use of reload_completed. I can understand that it needs to be true here, but I don't quite understand the following: ``` (define_insn_and_split "*zero_extendsidi2_internal" [(set (match_operand:DI 0 "register_operand" "=r,r,r,r") (zero_extend:DI (match_operand:SI 1 "nonimmediate_operand" "r,m,ZC,k")))] "TARGET_64BIT" "@ bstrpick.d\t%0,%1,31,0 ld.wu\t%0,%1 # ldx.wu\t%0,%1" "&& reload_completed I don't really understand it either... It is here since day 1 LoongArch support was added and I've never had enough courage to hack this part. I pushed the 1st patch as an emergency wrong-code fix now. The 2nd patch can wait until we figure out the best way to support the fusion. I agree. I also compared the previous assembly code when bstrpick_alsl_paired was not deleted to analyze this problem.
[PATCH 2/5] LoongArch: Don't use "+" for atomic_{load, store} "m" constraint
Atomic load does not modify the memory. Atomic store does not read the memory, thus we can use "=" instead. gcc/ChangeLog: * config/loongarch/sync.md (atomic_load): Remove "+" for the memory operand. (atomic_store): Use "=" instead of "+" for the memory operand. --- gcc/config/loongarch/sync.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md index 75b134cd853..15413e4df1b 100644 --- a/gcc/config/loongarch/sync.md +++ b/gcc/config/loongarch/sync.md @@ -105,7 +105,7 @@ (define_insn "mem_thread_fence_1" (define_insn "atomic_load" [(set (match_operand:QHWD 0 "register_operand" "=r") (unspec_volatile:QHWD - [(match_operand:QHWD 1 "memory_operand" "+m") + [(match_operand:QHWD 1 "memory_operand" "m") (match_operand:SI 2 "const_int_operand")];; model UNSPEC_ATOMIC_LOAD))] "" @@ -142,7 +142,7 @@ (define_insn "atomic_load" ;; Implement atomic stores with amoswap. Fall back to fences for atomic loads. (define_insn "atomic_store" - [(set (match_operand:QHWD 0 "memory_operand" "+m") + [(set (match_operand:QHWD 0 "memory_operand" "=m") (unspec_volatile:QHWD [(match_operand:QHWD 1 "reg_or_0_operand" "rJ") (match_operand:SI 2 "const_int_operand")] ;; model -- 2.48.1
[PATCH 5/5] LoongArch: Remove "b 3f" instruction if unneeded
This instruction is used to skip an redundant barrier if -mno-ld-seq-sa or the memory model requires a barrier on failure. But with -mld-seq-sa and other memory models the barrier may be nonexisting at all, and we should remove the "b 3f" instruction as well. The implementation uses a new operand modifier "%T" to output a comment marker if the operand is a memory order for which the barrier won't be generated. "%T", and also "%t", are not really used before and the code for them in loongarch_print_operand_reloc is just some MIPS legacy. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_print_operand_reloc): Make "%T" output a comment marker if the operand is a memory order for which the barrier won't be generated; remove "%t". * config/loongarch/sync.md (atomic_cas_value_strong): Add %T before "b 3f". (atomic_cas_value_cmp_and_7_): Likewise. --- gcc/config/loongarch/loongarch.cc | 19 --- gcc/config/loongarch/sync.md | 4 ++-- 2 files changed, 10 insertions(+), 13 deletions(-) diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index e9978370e8c..341a92bc942 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -6183,9 +6183,7 @@ loongarch_print_operand_reloc (FILE *file, rtx op, bool hi64_part, 'Q' Print R_LARCH_RELAX for TLS IE. 'r' Print address 12-31bit relocation associated with OP. 'R' Print address 32-51bit relocation associated with OP. - 'T' Print 'f' for (eq:CC ...), 't' for (ne:CC ...), - 'z' for (eq:?I ...), 'n' for (ne:?I ...). - 't' Like 'T', but with the EQ/NE cases reversed + 'T' Print a comment marker if %G outputs nothing. 'u' Print a LASX register. 'v' Print the insn size suffix b, h, w or d for vector modes V16QI, V8HI, V4SI, V2SI, and w, d for vector modes V4SF, V2DF respectively. @@ -6264,6 +6262,13 @@ loongarch_print_operand (FILE *file, rtx op, int letter) fputs ("dbar\t0x700", file); break; +case 'T': + if (!loongarch_cas_failure_memorder_needs_acquire ( + memmodel_from_int (INTVAL (op))) + && ISA_HAS_LD_SEQ_SA) + fprintf (file, "%s", ASM_COMMENT_START); + break; + case 'h': if (code == HIGH) op = XEXP (op, 0); @@ -6342,14 +6347,6 @@ loongarch_print_operand (FILE *file, rtx op, int letter) false /* lo_reloc */); break; -case 't': -case 'T': - { - int truth = (code == NE) == (letter == 'T'); - fputc ("zfnt"[truth * 2 + FCC_REG_P (REGNO (XEXP (op, 0)))], file); - } - break; - case 'V': if (CONST_VECTOR_P (op)) { diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md index 67cf9f47e5a..881d216aed0 100644 --- a/gcc/config/loongarch/sync.md +++ b/gcc/config/loongarch/sync.md @@ -267,7 +267,7 @@ (define_insn "atomic_cas_value_strong" output_asm_insn ("or%i3\t%5,$zero,%3", operands); output_asm_insn ("sc.\t%5,%1", operands); output_asm_insn ("beqz\t%5,1b", operands); - output_asm_insn ("b\t3f", operands); + output_asm_insn ("%T4b\t3f", operands); output_asm_insn ("2:", operands); output_asm_insn ("%G4", operands); output_asm_insn ("3:", operands); @@ -411,7 +411,7 @@ (define_insn "atomic_cas_value_cmp_and_7_" "or%i5\\t%7,%7,%5\\n\\t" "sc.\\t%7,%1\\n\\t" "beq\\t$zero,%7,1b\\n\\t" -"b\\t3f\\n\\t" +"%T6b\\t3f\\n\\t" "2:\\n\\t" "%G6\\n\\t" "3:\\n\\t"; -- 2.48.1
[PATCH 1/5] LoongArch: (NFC) Remove atomic_optab and use amop instead
They are the same. gcc/ChangeLog: * config/loongarch/sync.md (atomic_optab): Remove. (atomic_): Change atomic_optab to amop. (atomic_fetch_): Likewise. --- gcc/config/loongarch/sync.md | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md index fd8d732dd67..75b134cd853 100644 --- a/gcc/config/loongarch/sync.md +++ b/gcc/config/loongarch/sync.md @@ -35,8 +35,6 @@ (define_c_enum "unspec" [ ]) (define_code_iterator any_atomic [plus ior xor and]) -(define_code_attr atomic_optab - [(plus "add") (ior "or") (xor "xor") (and "and")]) ;; This attribute gives the format suffix for atomic memory operations. (define_mode_attr amo [(QI "b") (HI "h") (SI "w") (DI "d")]) @@ -175,7 +173,7 @@ (define_insn "atomic_store" } [(set (attr "length") (const_int 12))]) -(define_insn "atomic_" +(define_insn "atomic_" [(set (match_operand:GPR 0 "memory_operand" "+ZB") (unspec_volatile:GPR [(any_atomic:GPR (match_dup 0) @@ -197,7 +195,7 @@ (define_insn "atomic_add" "amadd%A2.\t$zero,%z1,%0" [(set (attr "length") (const_int 4))]) -(define_insn "atomic_fetch_" +(define_insn "atomic_fetch_" [(set (match_operand:GPR 0 "register_operand" "=&r") (match_operand:GPR 1 "memory_operand" "+ZB")) (set (match_dup 1) -- 2.48.1
[PATCH 0/5] LoongArch: Atomic operation clean-up and micro-optimization
Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? Xi Ruoyao (5): LoongArch: (NFC) Remove atomic_optab and use amop instead LoongArch: Don't use "+" for atomic_{load,store} "m" constraint LoongArch: Allow using bstrins for masking the address in atomic_test_and_set LoongArch: Don't emit overly-restrictive barrier for LL-SC loops LoongArch: Remove "b 3f" instruction if unneeded gcc/config/loongarch/loongarch.cc | 19 ++--- gcc/config/loongarch/sync.md | 46 ++- 2 files changed, 29 insertions(+), 36 deletions(-) -- 2.48.1
Re: [Patch, fortran] PR96087 - [12/13/14/15 Regression] ICE in gfc_get_symbol_decl, at fortran/trans-decl.c:1575
Hi Paul, the patch looks reasonable to me. Ok for mainline. Just a side-thought: Could it be possible, that the for-loop in trans-decl does not find the result? Would an assert after the loop at least give a hint, where something went wrong? That's just from reading the code, so if you think that can not happen, feel free to commit w/o the assert. Regards, Andre On Wed, 22 Jan 2025 14:03:15 + Paul Richard Thomas wrote: > Hi All, > > This patch fixes a double ICE arising from confusion between the dummy > symbol arising from a module function/subroutine interface and the module > procedure itself. In both cases, use of the name is unambiguous, as > explained in the ChangeLog. The testcase contains both the original and the > variant in comment 1. > > Regtests OK with FC41/x86_64 - OK for mainline and later backporting? > > Paul > > Fortran: Regression- fix ICE at fortran/trans-decl.c:1575 [PR96087] > > 2025-01-22 Paul Thomas > > gcc/fortran > PR fortran/96087 > * trans-decl.cc (gfc_get_symbol_decl): If a dummy is missing a > backend decl, it is likely that it has come from a module proc > interface. Look for the formal symbol by name in the containing > proc and use its backend decl. > * trans-expr.cc (gfc_apply_interface_mapping_to_expr): For the > same reason, match the name, rather than the symbol address to > perform the mapping. > > gcc/testsuite/ > PR fortran/96087 > * gfortran.dg/pr96087.f90: New test. -- Andre Vehreschild * Email: vehre ad gmx dot de
[PATCH 3/5] LoongArch: Allow using bstrins for masking the address in atomic_test_and_set
We can use bstrins for masking the address here. As people are already working on LA32R (which lacks bstrins instructions), for future-proofing we check whether (const_int -4) is an and_operand and force it into an register if not. gcc/ChangeLog: * config/loongarch/sync.md (atomic_test_and_set): Use bstrins for masking the address if possible. --- gcc/config/loongarch/sync.md | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md index 15413e4df1b..5c2eb34f4d4 100644 --- a/gcc/config/loongarch/sync.md +++ b/gcc/config/loongarch/sync.md @@ -359,12 +359,13 @@ (define_expand "atomic_test_and_set" rtx mem = operands[1]; rtx model = operands[2]; rtx addr = force_reg (Pmode, XEXP (mem, 0)); - rtx tmp_reg = gen_reg_rtx (Pmode); - rtx zero_reg = gen_rtx_REG (Pmode, 0); - + rtx mask = gen_int_mode (-4, Pmode); rtx aligned_addr = gen_reg_rtx (Pmode); - emit_move_insn (tmp_reg, gen_rtx_PLUS (Pmode, zero_reg, GEN_INT (-4))); - emit_move_insn (aligned_addr, gen_rtx_AND (Pmode, addr, tmp_reg)); + + if (!and_operand (mask, Pmode)) +mask = force_reg (Pmode, mask); + + emit_move_insn (aligned_addr, gen_rtx_AND (Pmode, addr, mask)); rtx aligned_mem = change_address (mem, SImode, aligned_addr); set_mem_alias_set (aligned_mem, 0); -- 2.48.1
[PATCH 4/5] LoongArch: Don't emit overly-restrictive barrier for LL-SC loops
For LL-SC loops, if the atomic operation has succeeded, the SC instruction always imply a full barrier, so the barrier we manually inserted only needs to take the account for the failure memorder, not the success memorder (the barrier is skipped with "b 3f" on success anyway). Note that if we use the AMCAS instructions, we indeed need to consider both the success memorder an the failure memorder deciding if "_db" suffix is needed. Thus the semantics of atomic_cas_value_strong and atomic_cas_value_strong_amcas start to be different. To prevent the compiler from being too clever, use a different unspec code for AMCAS instructions. gcc/ChangeLog: * config/loongarch/sync.md (UNSPEC_COMPARE_AND_SWAP_AMCAS): New UNSPEC code. (atomic_cas_value_strong): NFC, update the comment to note we only need to consider failure memory order. (atomic_cas_value_strong_amcas): Use UNSPEC_COMPARE_AND_SWAP_AMCAS instead of UNSPEC_COMPARE_AND_SWAP. (atomic_compare_and_swap): Pass failure memorder to gen_atomic_cas_value_strong. (atomic_compare_and_swap): Pass failure memorder to gen_atomic_cas_value_cmp_and_7_si. --- gcc/config/loongarch/sync.md | 21 + 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md index 5c2eb34f4d4..67cf9f47e5a 100644 --- a/gcc/config/loongarch/sync.md +++ b/gcc/config/loongarch/sync.md @@ -21,6 +21,7 @@ (define_c_enum "unspec" [ UNSPEC_COMPARE_AND_SWAP + UNSPEC_COMPARE_AND_SWAP_AMCAS UNSPEC_COMPARE_AND_SWAP_ADD UNSPEC_COMPARE_AND_SWAP_SUB UNSPEC_COMPARE_AND_SWAP_AND @@ -238,7 +239,7 @@ (define_insn "atomic_cas_value_strong" (set (match_dup 1) (unspec_volatile:GPR [(match_operand:GPR 2 "reg_or_0_operand" "rJ") (match_operand:GPR 3 "reg_or_0_operand" "rJ") - (match_operand:SI 4 "const_int_operand")] ;; mod_s + (match_operand:SI 4 "const_int_operand")] ;; mod_f UNSPEC_COMPARE_AND_SWAP)) (clobber (match_scratch:GPR 5 "=&r"))] "" @@ -286,8 +287,8 @@ (define_insn "atomic_cas_value_strong_amcas" (set (match_dup 1) (unspec_volatile:QHWD [(match_operand:QHWD 2 "reg_or_0_operand" "rJ") (match_operand:QHWD 3 "reg_or_0_operand" "rJ") - (match_operand:SI 4 "const_int_operand")] ;; mod_s -UNSPEC_COMPARE_AND_SWAP))] + (match_operand:SI 4 "const_int_operand")] ;; mod +UNSPEC_COMPARE_AND_SWAP_AMCAS))] "ISA_HAS_LAMCAS" "ori\t%0,%z2,0\n\tamcas%A4.\t%0,%z3,%1" [(set (attr "length") (const_int 8))]) @@ -316,16 +317,14 @@ (define_expand "atomic_compare_and_swap" && is_mm_release (memmodel_base (INTVAL (mod_s mod_s = GEN_INT (MEMMODEL_ACQ_REL); - operands[6] = mod_s; - if (ISA_HAS_LAMCAS) emit_insn (gen_atomic_cas_value_strong_amcas (operands[1], operands[2], operands[3], operands[4], -operands[6])); +mod_s)); else emit_insn (gen_atomic_cas_value_strong (operands[1], operands[2], operands[3], operands[4], - operands[6])); + mod_f)); rtx compare = operands[1]; if (operands[3] != const0_rtx) @@ -399,7 +398,7 @@ (define_insn "atomic_cas_value_cmp_and_7_" (match_operand:GPR 3 "reg_or_0_operand" "rJ") (match_operand:GPR 4 "reg_or_0_operand" "rJ") (match_operand:GPR 5 "reg_or_0_operand" "rJ") - (match_operand:SI 6 "const_int_operand")] ;; model + (match_operand:SI 6 "const_int_operand")] ;; mod_f UNSPEC_COMPARE_AND_SWAP)) (clobber (match_scratch:GPR 7 "=&r"))] "" @@ -443,18 +442,16 @@ (define_expand "atomic_compare_and_swap" && is_mm_release (memmodel_base (INTVAL (mod_s mod_s = GEN_INT (MEMMODEL_ACQ_REL); - operands[6] = mod_s; - if (ISA_HAS_LAMCAS) emit_insn (gen_atomic_cas_value_strong_amcas (operands[1], operands[2], operands[3], operands[4], - operands[6])); + mod_s)); else { union loongarch_gen_fn_ptrs generator; generator.fn_7 = gen_atomic_cas_value_cmp_and_7_si; loongarch_expand_atomic_qihi (generator, operands[1], operands[2], - operands[3], operands[4], operands[6]); + operands[3], operands[
Re: [PATCH v3] c++: fix wrong-code with constexpr prvalue opt [PR118396]
On 1/21/25 7:04 PM, Marek Polacek wrote: On Tue, Jan 21, 2025 at 11:00:13AM -0500, Jason Merrill wrote: On 1/21/25 9:54 AM, Jason Merrill wrote: On 1/20/25 5:58 PM, Marek Polacek wrote: @@ -9087,7 +9092,9 @@ cxx_eval_outermost_constant_expr (tree t, bool allow_non_constant, return r; else if (non_constant_p && TREE_CONSTANT (r)) r = mark_non_constant (r); - else if (non_constant_p) + else if (non_constant_p + /* Check we are not trying to return the wrong type. */ + || !same_type_ignoring_top_level_qualifiers_p (type, TREE_TYPE (r))) return t; Actually, I guess we also want to give an error if !allow_non_constant. Like so? (It would not have triggered before pre-r15-7103 because there we're called from maybe_constant_init, thus allow_non_constant=true.) Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? -- >8 -- This patch adds an error in a !allow_non_constant case when the initializer/object types don't match. PR c++/118396 gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_outermost_constant_expr): Add an error call when !allow_non_constant. --- gcc/cp/constexpr.cc | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc index 9f950ffed74..85d469b055e 100644 --- a/gcc/cp/constexpr.cc +++ b/gcc/cp/constexpr.cc @@ -9092,11 +9092,19 @@ cxx_eval_outermost_constant_expr (tree t, bool allow_non_constant, return r; else if (non_constant_p && TREE_CONSTANT (r)) r = mark_non_constant (r); - else if (non_constant_p - /* Check we are not trying to return the wrong type. */ - || !same_type_ignoring_top_level_qualifiers_p (type, TREE_TYPE (r))) + else if (non_constant_p) return t; + /* Check we are not trying to return the wrong type. */ + if (!same_type_ignoring_top_level_qualifiers_p (type, TREE_TYPE (r))) +{ + /* If so, this is not a constant expression. */ + if (!allow_non_constant) + error ("%qE is not a constant expression because it initializes " + "the wrong object", t); Let's print the types as well. Maybe "because it initializes a %qT rather than %T"? OK with that change. + return t; +} + if (should_unshare) r = unshare_expr (r); base-commit: 16d778239397b2f70a1e0680c0b82ae6ee98fe9e
[Patch, fortran] PR96087 - [12/13/14/15 Regression] ICE in gfc_get_symbol_decl, at fortran/trans-decl.c:1575
Hi All, This patch fixes a double ICE arising from confusion between the dummy symbol arising from a module function/subroutine interface and the module procedure itself. In both cases, use of the name is unambiguous, as explained in the ChangeLog. The testcase contains both the original and the variant in comment 1. Regtests OK with FC41/x86_64 - OK for mainline and later backporting? Paul Fortran: Regression- fix ICE at fortran/trans-decl.c:1575 [PR96087] 2025-01-22 Paul Thomas gcc/fortran PR fortran/96087 * trans-decl.cc (gfc_get_symbol_decl): If a dummy is missing a backend decl, it is likely that it has come from a module proc interface. Look for the formal symbol by name in the containing proc and use its backend decl. * trans-expr.cc (gfc_apply_interface_mapping_to_expr): For the same reason, match the name, rather than the symbol address to perform the mapping. gcc/testsuite/ PR fortran/96087 * gfortran.dg/pr96087.f90: New test. diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc index 4ae22a5584d..97bb0a41858 100644 --- a/gcc/fortran/trans-decl.cc +++ b/gcc/fortran/trans-decl.cc @@ -1722,6 +1722,21 @@ gfc_get_symbol_decl (gfc_symbol * sym) sym->backend_decl = DECL_CHAIN (sym->backend_decl); } + /* Automatic array indices in module procedures need the backend_decl + to be extracted from the procedure formal arglist. */ + if (sym->attr.dummy && !sym->backend_decl) + { + gfc_formal_arglist *f; + for (f = sym->ns->proc_name->formal; f; f = f->next) + { + gfc_symbol *fsym = f->sym; + if (strcmp (sym->name, fsym->name)) + continue; + sym->backend_decl = fsym->backend_decl; + break; + } + } + /* Dummy variables should already have been created. */ gcc_assert (sym->backend_decl); diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc index bef49d32a58..1bf46c616bb 100644 --- a/gcc/fortran/trans-expr.cc +++ b/gcc/fortran/trans-expr.cc @@ -5099,7 +5099,7 @@ gfc_apply_interface_mapping_to_expr (gfc_interface_mapping * mapping, /* TODO Find out why the condition on expr->symtree had to be moved into the loop rather than being outside it, as originally. */ for (sym = mapping->syms; sym; sym = sym->next) -if (expr->symtree && sym->old == expr->symtree->n.sym) +if (expr->symtree && !strcmp (sym->old->name, expr->symtree->n.sym->name)) { if (sym->new_sym->n.sym->backend_decl) expr->symtree = sym->new_sym; diff --git a/gcc/testsuite/gfortran.dg/pr96087.f90 b/gcc/testsuite/gfortran.dg/pr96087.f90 new file mode 100644 index 000..6c75d4f0cf2 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/pr96087.f90 @@ -0,0 +1,46 @@ +! { dg-do run } + +module m + interface + module function f(a, n, b) result(z) + integer, intent(in) :: n + real :: z(n + 1) + real :: a, b + end + end interface +contains + module procedure f + integer :: i + do i = 1, size(z) +z(i) = real(i) + end do + end procedure +end + +! Comment 1 +module n + interface + module subroutine g(n, z) + integer, intent(in) :: n + real :: z(n) + end + end interface +contains + module procedure g + z = 1 + if (int (sum (z)) /= n) stop 1 + end procedure +end + + use m + use n + real, allocatable :: r(:) + integer :: i = 2 + r = f (1.0, i+1, 2.0) + if (any (r .ne. [(real(i), i = 1,4)])) stop 2 + if (any (f (3.0, 1, 4.0) .ne. [(real(i), i = 1,2)])) stop 3 + + r = [(real (i), i = 10,20)] + call g (5, r) + if (int (sum (r)) /= (sum ([(i, i = 15,20)]) + 5)) stop 4 +end
Re: [PATCH] c++/modules: Fix exporting temploid friends in header units [PR118582]
On 1/22/25 6:30 AM, Nathaniel Shead wrote: Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? OK. -- >8 -- When we started streaming the bit to handle merging of imported temploid friends in r15-2807, I unthinkingly only streamed it in the '!state->is_header ()' case. This patch reworks the streaming logic to ensure that this data is always streamed, including for unique entities (in case that ever comes up somehow). This does make the streaming slightly less efficient, as functions and types will need an extra byte, but this doesn't appear to make a huge difference to the size of the resulting module; the 'std' module on my machine grows by 0.2% from 30671136 to 30730144 bytes. PR c++/118582 gcc/cp/ChangeLog: * module.cc (trees_out::decl_value): Always stream imported_temploid_friends information. (trees_in::decl_value): Likewise. gcc/testsuite/ChangeLog: * g++.dg/modules/pr118582_a.H: New test. * g++.dg/modules/pr118582_b.H: New test. * g++.dg/modules/pr118582_c.H: New test. Signed-off-by: Nathaniel Shead --- gcc/cp/module.cc | 47 +++ gcc/testsuite/g++.dg/modules/pr118582_a.H | 16 gcc/testsuite/g++.dg/modules/pr118582_b.H | 6 +++ gcc/testsuite/g++.dg/modules/pr118582_c.H | 5 +++ 4 files changed, 49 insertions(+), 25 deletions(-) create mode 100644 gcc/testsuite/g++.dg/modules/pr118582_a.H create mode 100644 gcc/testsuite/g++.dg/modules/pr118582_b.H create mode 100644 gcc/testsuite/g++.dg/modules/pr118582_c.H diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index 813c1436141..17215594fd3 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -2791,7 +2791,7 @@ static keyed_map_t *keyed_table; /* Instantiations of temploid friends imported from another module need to be attached to the same module as the temploid. This maps - these decls to the temploid they are instantiated them, as there is + these decls to the temploid they are instantiated from, as there is no other easy way to get this information. */ static GTY((cache)) decl_tree_cache_map *imported_temploid_friends; @@ -7961,7 +7961,6 @@ trees_out::decl_value (tree decl, depset *dep) } merge_kind mk = get_merge_kind (decl, dep); - bool is_imported_temploid_friend = imported_temploid_friends->get (decl); if (CHECKING_P) { @@ -7998,10 +7997,6 @@ trees_out::decl_value (tree decl, depset *dep) is_attached = true; bits.b (is_attached); - - /* Also tell the importer whether this is an imported temploid -friend, which has implications for merging. */ - bits.b (is_imported_temploid_friend); } bits.b (dep && dep->has_defn ()); } @@ -8087,6 +8082,16 @@ trees_out::decl_value (tree decl, depset *dep) tree container = decl_container (decl); unsigned tpl_levels = 0; + /* Also tell the importer whether this is a temploid friend attached + to a different module (which has implications for merging), so that + importers can reconstruct this information on stream-in. */ + if (TREE_CODE (inner) == FUNCTION_DECL || TREE_CODE (inner) == TYPE_DECL) +{ + tree* temploid_friend_slot = imported_temploid_friends->get (decl); + gcc_checking_assert (!temploid_friend_slot || *temploid_friend_slot); + tree_node (temploid_friend_slot ? *temploid_friend_slot : NULL_TREE); +} + { auto wmk = make_temp_override (dep_hash->writing_merge_key, true); if (decl != inner) @@ -8182,14 +8187,6 @@ trees_out::decl_value (tree decl, depset *dep) } } - if (is_imported_temploid_friend) -{ - /* Write imported temploid friends so that importers can reconstruct -this information on stream-in. */ - tree* slot = imported_temploid_friends->get (decl); - tree_node (*slot); -} - bool is_typedef = false; if (!type && TREE_CODE (inner) == TYPE_DECL) { @@ -8266,7 +8263,6 @@ trees_in::decl_value () { int tag = 0; bool is_attached = false; - bool is_imported_temploid_friend = false; bool has_defn = false; unsigned mk_u = u (); if (mk_u >= MK_hwm || !merge_kind_name[mk_u]) @@ -8287,10 +8283,7 @@ trees_in::decl_value () { bits_in bits = stream_bits (); if (!(mk & MK_template_mask) && !state->is_header ()) - { - is_attached = bits.b (); - is_imported_temploid_friend = bits.b (); - } + is_attached = bits.b (); has_defn = bits.b (); } @@ -8385,6 +8378,12 @@ trees_in::decl_value () tree container = decl_container (); unsigned tpl_levels = 0; + /* If this is an imported temploid friend, get the owning decl its + attachment is determined by (or NULL_TREE otherwise). */ + tree temploid_friend = NULL_TREE; + if (TREE_CODE (inner) == FUNCTION_DECL
Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures
> On Jan 22, 2025, at 11:22, Martin Uecker wrote: > > > Hello Michael, > > Am Mittwoch, dem 22.01.2025 um 16:54 +0100 schrieb Michael Matz: >> On Wed, 22 Jan 2025, Martin Uecker wrote: >> > So you do not need to look further. But maybe I am missing something > else. Like ... >> Note further that you may have '{ .y[1][3].z }', which is still not a >> designation, but an expression under your proposal, whereas >> '{ .y[1][3].z = 1 }' would remain a designation. This shows that you >> now need arbitrary look-ahead to disambiguate the two. A Very Bad Idea. ... this? >>> >>> In .y[1][3].z after .y you can decide whether y is a member of the >>> struct being initialized. If it is, it is a designator and if not >>> it must be an expression. >> >> If y is not a member it must be an expression, true. But if it's a member >> you don't know, it may be a designation or an expression. > > In an initializer I know all the members. I am not familiar with the parser, so, I am a little confused about the following: Suppose we have: struct foo { int z; float f; } struct bar { char *array __attribute__ ((counted_by (.y[1][3].z + 4))); struct foo y[5][10]; } So, in the above, when parsing the above expression inside counted_by, can the current parser be easily to be extended to parse it? thanks. Qing > > Martin > >
[PATCH] ipa-cp: Perform operations in the appropriate types (PR 118097)
Hi, one of the testcases from PR 118097 and the one from PR 118535 show that the fix to PR 118138 was incomplete. We must not only make sure that (intermediate) results of operations performed by IPA-CP are fold_converted to the type of the destination formal parameter but we also must decouple the these types from the ones in which operations are performed. This patch does that, even though we do not store or stream the operation types, instead we simply limit ourselves to tcc_comparisons and operations for which the first operand and the result are of the same type as determined by expr_type_first_operand_type_p. If we wanted to go beyond these, we would indeed need to store/stream the respective operation type. ipa_value_from_jfunc needs an additional check that res_type is not NULL because it is not called just from within IPA-CP (where we know we have a destination lattice slot belonging to a defined parameter) but also from inlining, ipa-fnsummary and ipa-modref where it is used to examine a call to a function with variadic arguments and we do not have types for the unknown parameters. But we cannot really work with those or estimate any benefits when it comes to them, so ignoring them should be OK. Even after this patch, ipa_get_jf_arith_result has a parameter called res_type in which it performs operations for aggregate jump functions, where we do not allow type conversions when constucting the jump functions and the type is the type of the stored data. In GCC 16, we could relax this and allow conversions like for scalars. Bootstrapped, LTO-bootstrapped and tested on x86_64-linux. OK for master? Thanks, Honza gcc/ChangeLog: 2025-01-20 Martin Jambor PR ipa/118097 * ipa-cp.cc (ipa_get_jf_arith_result): Adjust comment. (ipa_get_jf_pass_through_result): Removed. (ipa_value_from_jfunc): Use directly ipa_get_jf_arith_result, do not specify operation type but make sure we check and possibly convert the result. (get_val_across_arith_op): Remove the last parameter, always pass NULL_TREE to ipa_get_jf_arith_result in its last argument. (propagate_vals_across_arith_jfunc): Do not pass res_type to get_val_across_arith_op. (propagate_vals_across_pass_through): Add checking assert that parm_type is not NULL. gcc/testsuite/ChangeLog: 2025-01-20 Martin Jambor PR ipa/118097 * gcc.dg/ipa/pr118097.c: New test. * gcc.dg/ipa/pr118535.c: Likewise. * gcc.dg/ipa/ipa-notypes-1.c: Likewise. --- gcc/ipa-cp.cc| 46 ++-- gcc/testsuite/gcc.dg/ipa/ipa-notypes-1.c | 17 + gcc/testsuite/gcc.dg/ipa/pr118097.c | 23 gcc/testsuite/gcc.dg/ipa/pr118535.c | 17 + 4 files changed, 75 insertions(+), 28 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-notypes-1.c create mode 100644 gcc/testsuite/gcc.dg/ipa/pr118097.c create mode 100644 gcc/testsuite/gcc.dg/ipa/pr118535.c diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc index d89324a0077..68959f2677b 100644 --- a/gcc/ipa-cp.cc +++ b/gcc/ipa-cp.cc @@ -1467,11 +1467,10 @@ ipacp_value_safe_for_type (tree param_type, tree value) return NULL_TREE; } -/* Return the result of a (possibly arithmetic) operation on the constant - value INPUT. OPERAND is 2nd operand for binary operation. RES_TYPE is - the type of the parameter to which the result is passed. Return - NULL_TREE if that cannot be determined or be considered an - interprocedural invariant. */ +/* Return the result of a (possibly arithmetic) operation on the constant value + INPUT. OPERAND is 2nd operand for binary operation. RES_TYPE is the type + in which any operation is to be performed. Return NULL_TREE if that cannot + be determined or be considered an interprocedural invariant. */ static tree ipa_get_jf_arith_result (enum tree_code opcode, tree input, tree operand, @@ -1513,21 +1512,6 @@ ipa_get_jf_arith_result (enum tree_code opcode, tree input, tree operand, return res; } -/* Return the result of a (possibly arithmetic) pass through jump function - JFUNC on the constant value INPUT. RES_TYPE is the type of the parameter - to which the result is passed. Return NULL_TREE if that cannot be - determined or be considered an interprocedural invariant. */ - -static tree -ipa_get_jf_pass_through_result (struct ipa_jump_func *jfunc, tree input, - tree res_type) -{ - return ipa_get_jf_arith_result (ipa_get_jf_pass_through_operation (jfunc), - input, - ipa_get_jf_pass_through_operand (jfunc), - res_type); -} - /* Return the result of an ancestor jump function JFUNC on the constant value INPUT. Return NULL_TREE if that cannot be determined. */ @@ -1595,7 +1579,14 @@ ipa_value_from_jfunc (class ipa_node_params *info
Re: [PATCH] builtins: Store unspecified value to *exp for inf/nan [PR114877]
On 1/22/25 2:48 AM, Jakub Jelinek wrote: Hi! The fold_builtin_frexp folding for NaN/Inf just returned the first argument with evaluating second arguments side-effects, rather than storing something to what the second argument points to. The PR argues that the C standard requires the function to store something there but what exactly is stored is unspecified, so not storing there anything can result in UB if the value isn't initialized and is read later. glibc and newlib store there 0, musl apparently doesn't store anything. The following patch stores there zero (or would you prefer storing there some other value, 42, INT_MAX, INT_MIN, etc.?; zero is cheapest to form in assembly though) and adjusts the test so that it doesn't rely on not storing there anything but instead checks for -Wmaybe-uninitialized warning to find out that something has been stored there. Unfortunately I had to disable the NaN tests for -O0, while we can fold __builtin_isnan (__builtin_nan ("")) at compile time, we can't fold __builtin_isnan ((i = 0, __builtin_nan (""))) at compile time. fold_builtin_classify uses just tree_expr_nan_p and if that isn't true (because expr is a COMPOUND_EXPR with tree_expr_nan_p on the second arg), it does arg = builtin_save_expr (arg); return fold_build2_loc (loc, UNORDERED_EXPR, type, arg, arg); and that isn't folded at -O0 further, as we wrap it into SAVE_EXPR and nothing propagates the NAN to the comparison. I think perhaps tree_expr_nan_p etc. could have case COMPOUND_EXPR: added and recurse on the second argument, but that feels like stage1 material to me if we want to do that at all. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2025-01-22 Jakub Jelinek PR middle-end/114877 * builtins.cc (fold_builtin_frexp): Handle rvc_nan and rvc_inf cases like rvc_zero, return passed in arg and set *exp = 0. * gcc.dg/torture/builtin-frexp-1.c: Add -Wmaybe-uninitialized as dg-additional-options. (bar): New function. (TESTIT_FREXP2): Rework the macro so that it doesn't test whether nothing has been stored to what the second argument points to, but instead that something has been stored there, whatever it is. (main): Temporarily don't enable the nan tests for -O0. I think storing zero is the way to go. I can't see a compelling reason for any other value. OK for the trunk. Jeff
[patch,avr] ad PR117726: Tweak 32-bit logical shifts of 25...30 for -Oz
As it turns out, logical 32-bit shifts with an offset of 25..30 can be performed in 7 instructions or less. This beats the 7 instruc- tions required for the default code of a shift loop. Plus, with zero overhead, these cases can be 3-operand. This is only relevant for -Oz because with -Os, 3op shifts are split with -msplit-bit-shift (which is not performed with -Oz). Passes without new regressions. Ok for trunk? Johann -- AVR: PR117726 - Tweak 32-bit logical shifts of 25...30 for -Oz. As it turns out, logical 32-bit shifts with an offset of 25..30 can be performed in 7 instructions or less. This beats the 7 instruc- tions required for the default code of a shift loop. Plus, with zero overhead, these cases can be 3-operand. This is only relevant for -Oz because with -Os, 3op shifts are split with -msplit-bit-shift (which is not performed with -Oz). PR target/117726 gcc/ * config/avr/avr.cc (avr_ld_regno_p): New function. (ashlsi3_out) [case 25,26,27,28,29,30]: Handle and tweak. (lshrsi3_out): Same. (avr_rtx_costs_1) [SImode, ASHIFT, LSHIFTRT]: Adjust costs. * config/avr/avr.md (ashlsi3, *ashlsi3, *ashlsi3_const): Add "r,r,C4L" alternative. (lshrsi3, *lshrsi3, *lshrsi3_const): Add "r,r,C4R" alternative. * config/avr/constraints.md (C4R, C4L): New, gcc/testsuite/ * gcc.target/avr/torture/avr-torture.exp (AVR_TORTURE_OPTIONS): Turn one option variant into -Oz.diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc index e5a5aa34ec0..8628a438ab5 100644 --- a/gcc/config/avr/avr.cc +++ b/gcc/config/avr/avr.cc @@ -418,6 +418,15 @@ avr_adiw_reg_p (rtx reg) } +/* Return true iff REGNO is in R16...R31. */ + +static bool +avr_ld_regno_p (int regno) +{ + return TEST_HARD_REG_CLASS (LD_REGS, regno); +} + + static bool ra_in_progress () { @@ -7397,17 +7406,20 @@ ashlsi3_out (rtx_insn *insn, rtx operands[], int *plen) { if (CONST_INT_P (operands[2])) { + int off = INTVAL (operands[2]); int reg0 = true_regnum (operands[0]); int reg1 = true_regnum (operands[1]); bool reg1_unused_after = reg_unused_after (insn, operands[1]); - + bool scratch_p = (GET_CODE (PATTERN (insn)) == PARALLEL + && XVECLEN (PATTERN (insn), 0) == 3 + && REG_P (operands[3])); if (plen) *plen = 0; - switch (INTVAL (operands[2])) + switch (off) { default: - if (INTVAL (operands[2]) < 32) + if (off < 32) break; return AVR_HAVE_MOVW @@ -7461,11 +7473,58 @@ ashlsi3_out (rtx_insn *insn, rtx operands[], int *plen) "mov %D0,%B1" CR_TAB "clr %B0" CR_TAB "clr %A0", operands, plen, 4); + case 30: + if (AVR_HAVE_MUL && scratch_p) + return avr_asm_len ("ldi %3,1<<6" CR_TAB +"mul %3,%A1"CR_TAB +"mov %D0,r0"CR_TAB +"clr __zero_reg__" CR_TAB +"clr %C0" CR_TAB +"clr %B0" CR_TAB +"clr %A0", operands, plen, 7); + // Fallthrough + + case 28: + case 29: + { + const bool ld_reg0_p = avr_ld_regno_p (reg0 + 3); // %D0 + const bool ld_reg1_p = avr_ld_regno_p (reg1 + 0); // %A1 + if (ld_reg0_p + || (ld_reg1_p && reg1_unused_after) + || scratch_p) + { + if (ld_reg0_p) + avr_asm_len ("mov %D0,%A1"CR_TAB + "swap %D0" CR_TAB + "andi %D0,0xf0", operands, plen, 3); + else if (ld_reg1_p && reg1_unused_after) + avr_asm_len ("swap %A1" CR_TAB + "andi %A1,0xf0" CR_TAB + "mov %D0,%A1", operands, plen, 3); + else + avr_asm_len ("mov %D0,%A1"CR_TAB + "swap %D0" CR_TAB + "ldi %3,0xf0"CR_TAB + "and %D0,%3", operands, plen, 4); + for (int i = 28; i < off; ++i) + avr_asm_len ("lsl %D0", operands, plen, 1); + return avr_asm_len ("clr %C0" CR_TAB +"clr %B0" CR_TAB +"clr %A0", operands, plen, 3); + } + } + // Fallthrough + case 24: - return avr_asm_len ("mov %D0,%A1" CR_TAB - "clr %C0" CR_TAB + case 25: + case 26: + case 27: + avr_asm_len ("mov %D0,%A1", operands, plen, 1); + for (int i = 24; i < off; ++i) + avr_asm_len ("lsl %D0", operands, plen, 1); + return avr_asm_len ("clr %C0" CR_TAB "clr %B0" CR_TAB - "clr %A0", operands, plen, 4); + "clr %A0", operands, plen, 3); case 31: return AVR_HAVE_MOVW ? avr_asm_len ("bst %A1,0"CR_TAB @@ -8298,17 +8357,20 @@ lshrsi3_out (rtx_insn *insn, rtx operands[], int *plen) { if (CONST_INT_P (operands[2])) { + int off = INTVAL (operands[2]); int reg0 = true_regnum (operands[0]); int reg1 = true_regnum (operands[1]); bool reg1_unused_after = reg_unused_after (insn, operands[1]); - + bool scratch_p = (GET_CODE (PATTERN (insn)) == PARALLEL + && XVECLEN (PATTERN (insn), 0) == 3 + && REG_P (operands[3])); if (plen) *plen = 0; - switch (INTVAL (operands[2])) + switch (off)
Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures
Hi Quin, sorry, another idea I noted down some time ago which I would like to mention. > > > > - use it only in limited contexts where you do not need to know > > the type (e.g. this works for goto labels) or for a basic > > counted_by attribute that only takes an identifier as we have it now. A version of this is to allow taking the address of the member, but give it the type void*. The user can then convert it to the right type (and we could warn later if the cast goes to the wrong type). struct foo { int z; float f; } typedef struct counter_t[5][10]; struct bar { char *array __attribute__ ((counted_by ( (*(counter_t)&.y)[1][3].z + 4 ); struct foo y[5][10]; } This could also be used together with the function idea: struct bar { char *array __attribute__ (( counted_by ( bar_count(&.y) )); struct foo y[5][10]; } This would be simple to implement and do the job. Martin
Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures
Am Mittwoch, dem 22.01.2025 um 16:37 + schrieb Qing Zhao: > > > On Jan 22, 2025, at 11:22, Martin Uecker wrote: > > > > > > Hello Michael, > > > > Am Mittwoch, dem 22.01.2025 um 16:54 +0100 schrieb Michael Matz: > > > On Wed, 22 Jan 2025, Martin Uecker wrote: > > > > > > > > > So you do not need to look further. But maybe I am missing > > > > > > something > > > > > > else. > > > > > > > > > > Like ... > > > > > > > > > > > > Note further that you may have '{ .y[1][3].z }', which is still > > > > > > > not a > > > > > > > designation, but an expression under your proposal, whereas > > > > > > > '{ .y[1][3].z = 1 }' would remain a designation. This shows that > > > > > > > you > > > > > > > now need arbitrary look-ahead to disambiguate the two. A Very > > > > > > > Bad Idea. > > > > > > > > > > ... this? > > > > > > > > In .y[1][3].z after .y you can decide whether y is a member of the > > > > struct being initialized. If it is, it is a designator and if not > > > > it must be an expression. > > > > > > If y is not a member it must be an expression, true. But if it's a > > > member > > > you don't know, it may be a designation or an expression. > > > > In an initializer I know all the members. > > I am not familiar with the parser, so, I am a little confused about the > following: > > Suppose we have: > > struct foo { > int z; > float f; > } > > struct bar { > char *array __attribute__ ((counted_by (.y[1][3].z + 4))); > struct foo y[5][10]; > } > > So, in the above, when parsing the above expression inside counted_by, can the > current parser be easily to be extended to parse it? No, I don't think this can be done easily. The issue is that you do not know the declaration for y because it hasn't been parsed yet. If you forward reference some struct member, you have several possibilities: - use it only in limited contexts where you do not need to know the type (e.g. this works for goto labels) or for a basic counted_by attribute that only takes an identifier as we have it now. - simply assume it has a certain type (size_t as is proposed in the WG14 paper Joseph mentioned) and fail later if it does not. Both options would rule the construct above (but there could be workarounds). Other alternatives are: - you have same kind of forward declaration (as we have for parameters as GNU extension). In the context of C, this is the cleanest solution but either requires forward declaring the full struct (which can be done in C23) or new syntax for only forward declaring the member. - or you use some delayed parsing where you store away the tokens and parse it later when all structure members are done. I think this is a highly problematic approach for a variety of reasons. Martin > > thanks. > > Qing > > > > Martin > > > > >
[PATCH] [ifcombine] out-of-bounds bitfield refs can trap [PR118514]
On Jan 21, 2025, Richard Biener wrote: > So - your fix looks almost good, I'd adjust it to >> +case BIT_FIELD_REF: >> + if (DECL_P (TREE_OPERAND (expr, 0)) >> + && !bit_field_ref_in_bounds_p (expr)) >> + return true; >> + /* Fall through. */ > OK if that works. It does. But my intuition isn't happy with this. I don't get why only DECLs need to be checked here, and why we should check the type size rather than the decl size if we're only checking decls. I have another patch coming up that doesn't raise concerns for me, so I'll hold off from installing the above, in case you also prefer the other one. [ifcombine] out-of-bounds bitfield refs can trap [PR118514] Check that BIT_FIELD_REFs of DECLs are in range before deciding they don't trap. Check that a replacement bitfield load is as trapping as the replaced load. for gcc/ChangeLog PR tree-optimization/118514 * tree-eh.cc (bit_field_ref_in_bounds_p): New. (tree_could_trap_p) : Call it. * gimple-fold.cc (make_bit_field_load): Check trapping status of replacement load against original load. for gcc/testsuite/ChangeLog PR tree-optimization/118514 * gcc.dg/field-merge-23.c: New. --- gcc/gimple-fold.cc|5 + gcc/testsuite/gcc.dg/field-merge-23.c | 19 +++ gcc/tree-eh.cc| 29 - 3 files changed, 52 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/field-merge-23.c diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc index 3c971a29ef045..01c4d076af26c 100644 --- a/gcc/gimple-fold.cc +++ b/gcc/gimple-fold.cc @@ -7859,6 +7859,11 @@ make_bit_field_load (location_t loc, tree inner, tree orig_inner, tree type, gimple *new_stmt = gsi_stmt (i); if (gimple_has_mem_ops (new_stmt)) gimple_set_vuse (new_stmt, reaching_vuse); + gcc_checking_assert (! (gimple_assign_load_p (point) + && gimple_assign_load_p (new_stmt)) + || (tree_could_trap_p (gimple_assign_rhs1 (point)) + == tree_could_trap_p (gimple_assign_rhs1 +(new_stmt; } gimple_stmt_iterator gsi = gsi_for_stmt (point); diff --git a/gcc/testsuite/gcc.dg/field-merge-23.c b/gcc/testsuite/gcc.dg/field-merge-23.c new file mode 100644 index 0..d60f76206ebea --- /dev/null +++ b/gcc/testsuite/gcc.dg/field-merge-23.c @@ -0,0 +1,19 @@ +/* { dg-do run } */ +/* { dg-options "-O" } */ + +/* PR tree-optimization/118514 */ + +/* Check that we don't pull optimized references that could trap out of + loops. */ + +int a, c = 1; +unsigned char b[1], d; +int main() { + while (a || !c) { +signed char e = b[10]; +d = e < 0 || b[10] > 1; +if (d) + __builtin_abort (); + } + return 0; +} diff --git a/gcc/tree-eh.cc b/gcc/tree-eh.cc index 1033b124e4df3..27a4b2b5b1665 100644 --- a/gcc/tree-eh.cc +++ b/gcc/tree-eh.cc @@ -2646,6 +2646,29 @@ range_in_array_bounds_p (tree ref) return true; } +/* Return true iff EXPR, a BIT_FIELD_REF, accesses a bit range that is known to + be in bounds for the referred operand type. */ + +static bool +bit_field_ref_in_bounds_p (tree expr) +{ + tree size_tree; + poly_uint64 size_max, min, wid, max; + + size_tree = TYPE_SIZE (TREE_TYPE (TREE_OPERAND (expr, 0))); + if (!size_tree || !poly_int_tree_p (size_tree, &size_max)) +return false; + + min = bit_field_offset (expr); + wid = bit_field_size (expr); + max = min + wid; + if (maybe_lt (max, min) + || maybe_lt (size_max, max)) +return false; + + return true; +} + /* Return true if EXPR can trap, as in dereferencing an invalid pointer location or floating point arithmetic. C.f. the rtl version, may_trap_p. This routine expects only GIMPLE lhs or rhs input. */ @@ -2688,10 +2711,14 @@ tree_could_trap_p (tree expr) restart: switch (code) { +case BIT_FIELD_REF: + if (DECL_P (TREE_OPERAND (expr, 0)) && !bit_field_ref_in_bounds_p (expr)) + return true; + /* Fall through. */ + case COMPONENT_REF: case REALPART_EXPR: case IMAGPART_EXPR: -case BIT_FIELD_REF: case VIEW_CONVERT_EXPR: case WITH_SIZE_EXPR: expr = TREE_OPERAND (expr, 0); -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer More tolerance and less prejudice are key for inclusion and diversity Excluding neuro-others for not behaving ""normal"" is *not* inclusive
Re: [PATCH] testsuite: Only run test if alarm is available
On Jan 19, 2025, at 12:47 PM, Torbjorn SVENSSON wrote: > > On 2025-01-19 21:20, Andrew Pinski wrote: >> On Sun, Jan 19, 2025 at 12:17 PM Torbjörn SVENSSON >> wrote: >>> >>> Ok for trunk? >>> >>> -- >>> >>> Most baremetal toolchains will not have an implementation for alarm and >>> sigaction as they are target specific. >>> For arm-none-eabi with newlib, function signatures are exposed, but >>> there is no implmentation and thus the test cases causes a undefined >>> symbol link error. >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.dg/pr78185.c: Remove dg-do and replace with >>> with dg-require-effective-target of signal and alarm. >>> * gcc.dg/pr116906-1.c: Likewise. >>> * gcc.dg/pr116906-1.c: Likewise. >>> * gcc.dg/vect/pr101145inf.c: Use effective-target alarm. >>> * gcc.dg/vect/pr101145inf_1.c: Likewise. >>> * lib/target-supports.exp(check_effective_target_alarm): New. >>> >>> gcc/ChangeLog: >>> >>> * doc/sourcebuild.texi (Effective-Target Keywords): Document >>> 'alarm'. >>> >>> Signed-off-by: Torbjörn SVENSSON >>> --- >>> gcc/doc/sourcebuild.texi | 3 +++ >>> gcc/testsuite/gcc.dg/pr116906-1.c | 3 ++- >>> gcc/testsuite/gcc.dg/pr116906-2.c | 3 ++- >>> gcc/testsuite/gcc.dg/pr78185.c| 3 ++- >>> gcc/testsuite/gcc.dg/vect/pr101145inf.c | 1 + >>> gcc/testsuite/gcc.dg/vect/pr101145inf_1.c | 1 + >>> gcc/testsuite/lib/target-supports.exp | 23 +++ >>> 7 files changed, 34 insertions(+), 3 deletions(-) >>> >>> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi >>> index b5c1b23e527..98ede70f23c 100644 >>> --- a/gcc/doc/sourcebuild.texi >>> +++ b/gcc/doc/sourcebuild.texi >>> @@ -2808,6 +2808,9 @@ both scalar and vector modes. >>> @subsubsection Environment attributes >>> >>> @table @code >>> +@item alarm >>> +Target supports @code{alarm}. >>> + >>> @item c >>> The language for the compiler under test is C. >>> >>> diff --git a/gcc/testsuite/gcc.dg/pr116906-1.c >>> b/gcc/testsuite/gcc.dg/pr116906-1.c >>> index 27b1fdae02b..7187507a60d 100644 >>> --- a/gcc/testsuite/gcc.dg/pr116906-1.c >>> +++ b/gcc/testsuite/gcc.dg/pr116906-1.c >>> @@ -1,4 +1,5 @@ >>> -/* { dg-do run { target *-*-linux* *-*-gnu* *-*-uclinux* } } */ >>> +/* { dg-require-effective-target alarm } */ >>> +/* { dg-require-effective-target signal } */ >>> /* { dg-options "-O2" } */ >>> >>> #include >>> diff --git a/gcc/testsuite/gcc.dg/pr116906-2.c >>> b/gcc/testsuite/gcc.dg/pr116906-2.c >>> index 3478771678c..41a352bf837 100644 >>> --- a/gcc/testsuite/gcc.dg/pr116906-2.c >>> +++ b/gcc/testsuite/gcc.dg/pr116906-2.c >>> @@ -1,4 +1,5 @@ >>> -/* { dg-do run { target *-*-linux* *-*-gnu* *-*-uclinux* } } */ >>> +/* { dg-require-effective-target alarm } */ >>> +/* { dg-require-effective-target signal } */ >>> /* { dg-options "-O2 -fno-tree-ch" } */ >>> >>> #include >>> diff --git a/gcc/testsuite/gcc.dg/pr78185.c b/gcc/testsuite/gcc.dg/pr78185.c >>> index d7781b2080f..ada8b1b9f90 100644 >>> --- a/gcc/testsuite/gcc.dg/pr78185.c >>> +++ b/gcc/testsuite/gcc.dg/pr78185.c >>> @@ -1,4 +1,5 @@ >>> -/* { dg-do run { target *-*-linux* *-*-gnu* *-*-uclinux* } } */ >>> +/* { dg-require-effective-target alarm } */ >>> +/* { dg-require-effective-target signal } */ >>> /* { dg-options "-O" } */ >>> >>> #include >>> diff --git a/gcc/testsuite/gcc.dg/vect/pr101145inf.c >>> b/gcc/testsuite/gcc.dg/vect/pr101145inf.c >>> index aa598875aa5..70aea94b6e0 100644 >>> --- a/gcc/testsuite/gcc.dg/vect/pr101145inf.c >>> +++ b/gcc/testsuite/gcc.dg/vect/pr101145inf.c >>> @@ -1,3 +1,4 @@ >>> +/* { dg-require-effective-target alarm } */ >>> /* { dg-require-effective-target signal } */ >>> /* { dg-additional-options "-O3" } */ >>> #include >>> diff --git a/gcc/testsuite/gcc.dg/vect/pr101145inf_1.c >>> b/gcc/testsuite/gcc.dg/vect/pr101145inf_1.c >>> index 0465788c3cc..fe008284e1d 100644 >>> --- a/gcc/testsuite/gcc.dg/vect/pr101145inf_1.c >>> +++ b/gcc/testsuite/gcc.dg/vect/pr101145inf_1.c >>> @@ -1,3 +1,4 @@ >>> +/* { dg-require-effective-target alarm } */ >>> /* { dg-require-effective-target signal } */ >>> /* { dg-additional-options "-O3" } */ >>> #include >>> diff --git a/gcc/testsuite/lib/target-supports.exp >>> b/gcc/testsuite/lib/target-supports.exp >>> index 939ef3a4119..93795a7e27f 100644 >>> --- a/gcc/testsuite/lib/target-supports.exp >>> +++ b/gcc/testsuite/lib/target-supports.exp >>> @@ -14255,3 +14255,26 @@ proc add_options_for_nvptx_alloca_ptx { flags } { >>> >>> return $flags >>> } >>> + >>> +# Return true if alarm is supported on the target. >>> + >>> +proc check_effective_target_alarm { } { >> Maybe A small optimization is return false here if signal is not supported. >> That is: >> if ![check_effective_target_signal] { >> return 0 >> } > > Sure, I'll add that. > > Is it okay with this change? > Or should I send a v2 with this? Ok.
Re: [PATCH] c++: Implement for namespace statics CWG 2867 - Order of initialization for structured bindings [PR115769]
On 9/10/24 2:29 PM, Jakub Jelinek wrote: Hi! The following patch on top of the https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662507.html patch adds CWG 2867 support for namespace locals. Those vars are just pushed into {static,tls}_aggregates chain, then pruned from those lists, separated by priority and finally emitted into the corresponding dynamic initialization functions. The patch adds two flags used on the TREE_LIST nodes in those lists, one marks the structured binding base variable and/or associated ref extended temps, another marks the vars initialized using get methods. The flags are preserved across the pruning, for splitting into by priority all associated decls of a structured binding using tuple* are forced into the same priority as the first one, and finally when actually emitting code, CLEANUP_POINT_EXPRs are disabled in the base initializer(s) and code from the bases and non-bases together is wrapped into a single CLEANUP_POINT_EXPR. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Note, I haven't touched the module handling; from what I can see, prune_vars_needing_no_initialization is destructive to the {static,tls}_aggregates lists (keeps the list NULL at the end or if there are errors or it contains some DECL_EXTERNAL decls, keeps in there just those, not the actual vars that need dynamic initialization) and the module writing is done only afterwards, so I think it could work reasonably only if header_module_p (). Can namespace scope structured bindings appear in header_module_p () or !header_module_p () modules? How would a testcase using them look like? Especially when structured bindings can't be extern nor templates nor inline there can be just one definition, so the module would need to be included in a single file, no? In any case, the patch shouldn't make the modules case any worse, it just adds TREE_LIST flags which will not be streamed for modules and so if one can use structured bindings in modules, possibly CWG 2867 would be not fixed for those but nothing worse than that. 2024-09-10 Jakub Jelinek PR c++/115769 gcc/cp/ * cp-tree.h (STATIC_INIT_DECOMP_BASE_P): Define. (STATIC_INIT_DECOMP_NONBASE_P): Define. * decl.cc (cp_finish_decl): Mark nodes in {static,tls}_aggregates with * decl2.cc (decomp_handle_one_var, decomp_finalize_var_list): New functions. (emit_partial_init_fini_fn): Use them. (prune_vars_needing_no_initialization): Clear STATIC_INIT_DECOMP_*BASE_P flags if needed. (partition_vars_for_init_fini): Use same priority for consecutive STATIC_INIT_DECOMP_*BASE_P vars and propagate those flags to new TREE_LISTs when possible. Formatting fix. (handle_tls_init): Use decomp_handle_one_var and decomp_finalize_var_list functions. gcc/testsuite/ * g++.dg/DRs/dr2867-5.C: New test. * g++.dg/DRs/dr2867-6.C: New test. * g++.dg/DRs/dr2867-7.C: New test. * g++.dg/DRs/dr2867-8.C: New test. --- gcc/cp/cp-tree.h.jj 2024-09-07 09:31:20.601484156 +0200 +++ gcc/cp/cp-tree.h2024-09-09 15:53:44.924112247 +0200 @@ -470,6 +470,7 @@ extern GTY(()) tree cp_global_trees[CPTI BASELINK_FUNCTIONS_MAYBE_INCOMPLETE_P (in BASELINK) BIND_EXPR_VEC_DTOR (in BIND_EXPR) ATOMIC_CONSTR_EXPR_FROM_CONCEPT_P (in ATOMIC_CONSTR) + STATIC_INIT_DECOMP_BASE_P (in the TREE_LIST for {static,tls}_aggregates) 2: IDENTIFIER_KIND_BIT_2 (in IDENTIFIER_NODE) ICS_THIS_FLAG (in _CONV) DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (in VAR_DECL) @@ -489,6 +490,8 @@ extern GTY(()) tree cp_global_trees[CPTI IMPLICIT_CONV_EXPR_BRACED_INIT (in IMPLICIT_CONV_EXPR) PACK_EXPANSION_AUTO_P (in *_PACK_EXPANSION) contract_semantic (in ASSERTION_, PRECONDITION_, POSTCONDITION_STMT) + STATIC_INIT_DECOMP_NONBASE_P (in the TREE_LIST + for {static,tls}_aggregates) 3: IMPLICIT_RVALUE_P (in NON_LVALUE_EXPR or STATIC_CAST_EXPR) ICS_BAD_FLAG (in _CONV) FN_TRY_BLOCK_P (in TRY_BLOCK) @@ -5947,6 +5950,21 @@ extern bool defer_mangling_aliases; extern bool flag_noexcept_type; +/* True if this TREE_LIST in {static,tls}_aggregates is a for dynamic + initialization of namespace scope structured binding base or related + extended ref init temps. Temporaries from the initialization of + STATIC_INIT_DECOMP_BASE_P dynamic initializers should be destroyed only + after the last STATIC_INIT_DECOMP_NONBASE_P dynamic initializer following + it. */ +#define STATIC_INIT_DECOMP_BASE_P(NODE) \ + TREE_LANG_FLAG_1 (TREE_LIST_CHECK (NODE)) + +/* True if this TREE_LIST in {static,tls}_aggregates is a for dynamic + initialization of namespace scope structured binding non-base + variable using get. */ +#define STATIC_INIT_DECOMP_NONBASE_P(NODE) \ + TREE_LANG_FLAG_2 (TREE_LIST_CHECK (NODE)) + /* A list of namespace-sco
Re: [PATCH] c++: Implement for namespace statics CWG 2867 - Order of initialization for structured bindings [PR115769]
On 1/3/25 6:39 AM, Jakub Jelinek wrote: On Thu, Dec 19, 2024 at 11:56:10AM -0500, Jason Merrill wrote: Looks right. So, I've tried to construct a testcase, but turns out modules.cc just doesn't handle structured bindings at namespace scope at all, so it is premature to try to get the ordering right in it. The first patch contains the attempt to turn dr2867-5.C testcase into a module header testcase with a fix for the first ICE I've run into (but it ICEs immediately elsewhere, the assumption that all namespace scope decls have names is all around). How about giving it a name? That would seem useful for debugging as well. The second patch is much simpler testcase which doesn't care about ordering of anything and ICEs without/with the other patches the same. Should be all !DECL_NAME (decl) MK_unique? Or just DECL_DECOMPOSITION_P (decl) && !DECL_NAME (decl)? Structured bindings can't be redeclared, so I guess they are unique. Jakub
Re: [PATCH v3 3/6] c++: Fix ABI for lambdas declared in alias templates [PR116568]
On 1/16/25 7:24 PM, Nathaniel Shead wrote: On Thu, Jan 16, 2025 at 07:09:33PM -0500, Jason Merrill wrote: On 1/6/25 7:22 AM, Nathaniel Shead wrote: I'm not 100% sure I've handled this properly, any feedback welcome. In particular, maybe should I use `DECL_IMPLICIT_TYPEDEF_P` in the mangling logic instead of `!TYPE_DECL_ALIAS_P`? They both seem to work in this case but not sure which would be clearer. I also looked into trying do a limited form of 'start_decl' before parsing the type but there were too many circular dependencies for me to work through, so I think any such changes would have to wait till GCC16 (if they're even possible at all). -- >8 -- This adds mangling support for lambdas with a mangling context of an alias template, and gives that context when instantiating such a lambda. I think this is wrong, an alias is not an entity so it is not a definable item. The ABI change proposal also doesn't mention aliases. Ah right, I see; I'd treated https://eel.is/c++draft/basic.def.odr#1.5 as being any template, but I see now it's "any templated entity" which is different (since as you say an alias isn't an entity). In that case, how do you think we should handle class-scope alias templates of lambdas? Such a class is surely a definable item, and so e.g. struct S { template using X = decltype([]{ return I; }); }; using L1 = S::X<1>; using L2 = S::X<2>; should this work and declare L1 to be the same type across TUs? Hmm, I suppose it should. So then using the alias template name in the mangling is then not because it's a definable item, but just as a convenient label to indicate where it appears in the class and what the template arguments apply to. But even with that understanding, many of the changes in this patch to make aliases more special seem wrong, we shouldn't need those just to push/pop lambda scope? Jason
Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures
Am Mittwoch, dem 22.01.2025 um 17:30 +0100 schrieb Michael Matz: > Hello, > > On Wed, 22 Jan 2025, Martin Uecker wrote: > > > > > In .y[1][3].z after .y you can decide whether y is a member of the > > > > struct being initialized. If it is, it is a designator and if not > > > > it must be an expression. > > > > > > If y is not a member it must be an expression, true. But if it's a > > > member > > > you don't know, it may be a designation or an expression. > > > > In an initializer I know all the members. > > My sentence was ambiguous :-) Trying again: When it's a member, and you > know it's a member, then you still don't know if it's going to be a > designation or an expression. It can be both. I guess this depends on what you mean by "it can be". The rule would simply be that it is not an expression. The rationale is the following: If it is inside the initializer of a structure and references a member of the same structure, then it can not simultaneously be inside the argument to a counted_by attribute used in the declaration of this structure (which at this time has already been parsed completely). So there is no reason to allow it be interpreted as an expression and the rule I proposed would therefor simply state that it then is not an expression. struct { int n; int *buf [[counted_by(.n)]]; // this n is in a counted_by } x = { .n }; // this n can not be in counted_by for the same struct If we allowed this to be interpreted as an expression, then you could use it to reference a member during initialization, e.g. struct { int y; int x; } a = { .y = 1, .x = .y }; but this would be another extension unrelated to counted_by, which I did not intend to suggest. There are other possibilities for disambiguation, we could also simply state that in initializers at the syntatic position where a designator is allowed, it is always a designator and not expression, and it then does not reference a member of the structure being initialized, it is an error. Maybe this is even preferable. I would like to mention that what clang currently has in a prototype uses a mechnanism called "delayed parsing" which is essentially infinite lookahead (in addition to being confusing and incoherent with C language rules). So IMHO we need something better. My proposal for the moment would be to only allow very restricted syntactic forms, and not generic expressions, side stepping all these issues. Martin > > > Ciao, > Michael.
Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures
Am Mittwoch, dem 22.01.2025 um 18:11 +0100 schrieb Martin Uecker: > Am Mittwoch, dem 22.01.2025 um 16:37 + schrieb Qing Zhao: > > > > > On Jan 22, 2025, at 11:22, Martin Uecker wrote: > > > > > > > > > Hello Michael, > > > > > > Am Mittwoch, dem 22.01.2025 um 16:54 +0100 schrieb Michael Matz: > > > > On Wed, 22 Jan 2025, Martin Uecker wrote: > > > > > > > > > > > So you do not need to look further. But maybe I am missing > > > > > > > something > > > > > > > else. > > > > > > > > > > > > Like ... > > > > > > > > > > > > > > Note further that you may have '{ .y[1][3].z }', which is still > > > > > > > > not a > > > > > > > > designation, but an expression under your proposal, whereas > > > > > > > > '{ .y[1][3].z = 1 }' would remain a designation. This shows > > > > > > > > that you > > > > > > > > now need arbitrary look-ahead to disambiguate the two. A Very > > > > > > > > Bad Idea. > > > > > > > > > > > > ... this? > > > > > > > > > > In .y[1][3].z after .y you can decide whether y is a member of the > > > > > struct being initialized. If it is, it is a designator and if not > > > > > it must be an expression. > > > > > > > > If y is not a member it must be an expression, true. But if it's a > > > > member > > > > you don't know, it may be a designation or an expression. > > > > > > In an initializer I know all the members. > > > > I am not familiar with the parser, so, I am a little confused about the > > following: > > > > Suppose we have: > > > > struct foo { > > int z; > > float f; > > } > > > > struct bar { > > char *array __attribute__ ((counted_by (.y[1][3].z + 4))); > > struct foo y[5][10]; > > } > > > > So, in the above, when parsing the above expression inside counted_by, can > > the > > current parser be easily to be extended to parse it? > > No, I don't think this can be done easily. The issue is that you do > not know the declaration for y because it hasn't been parsed yet. > > If you forward reference some struct member, you have several > possibilities: > > - use it only in limited contexts where you do not need to know > the type (e.g. this works for goto labels) or for a basic > counted_by attribute that only takes an identifier as we have it now. > > - simply assume it has a certain type (size_t as is proposed in the > WG14 paper Joseph mentioned) and fail later if it does not. > > > Both options would rule the construct above (but there could be > workarounds). One of the workarounds could be to instead call a function (which could be inlined later) and that function takes a pointer to the member. Then it does not need to now anything about any member, e.g.: struct foo { int z; float f; } size_t bar_count(struct bar *); struct bar { char *array __attribute__ ((counted_by (bar_count(__self__; struct foo y[5][10]; } size_t bar_count(struct bar *p) { return p->y[1][3].z +4; } > Other alternatives are: > > - you have same kind of forward declaration (as we have for > parameters as GNU extension). In the context of C, this is the > cleanest solution but either requires forward declaring the > full struct (which can be done in C23) or new syntax for only > forward declaring the member. A possible C23 workaround could be: struct foo { int z; float f; } struct bar { char *array __attribute__ ((counted_by (*))); // star indicates missing size exppression struct foo y[5][10]; } struct bar { // redeclare with known size char *array __attribute__ ((counted_by (.y[1][3].z + 4))); struct foo y[5][10]; } Martin > > - or you use some delayed parsing where you store away the tokens > and parse it later when all structure members are done. I think > this is a highly problematic approach for a variety of reasons. > > > Martin > > > > > > thanks. > > > > Qing > > > > > > Martin > > > > > > > > >
Re: [PATCH] RISC-V: Disable two-source permutes for now [PR117173].
On 1/22/25 12:29 AM, Robin Dapp wrote: Hi, after testing on the BPI (4.2% improvement for x264 input 1, 4.4% for input 2) and the discussion in PR117173 I figured it's best to disable the two-source permutes by default for now. We quickly talked about this on the patchwork call last week. Conclusion was to just post the patch and discuss here. The patch adds a parameter "riscv-two-source-permutes" which restores the old behavior. It does not add a uarch knob to override the default. It's still possible to get two-source permutes in a register via the expander but the obvious constant cases are covered. Regtested on rv64gcv_zvl512b. Regards Robin PR target/117173 gcc/ChangeLog: * config/riscv/riscv-v.cc (shuffle_generic_patterns): Only support single-source permutes by default. * config/riscv/riscv.opt: New param "riscv-two-source-permutes". gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Run with two-source permutes. * lib/target-supports.exp: Ditto. * gcc.dg/fold-perm-2.c: Ditto. * gcc.dg/pr54346.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-12.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-12.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/pr111848.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/rvv-costmodel-vect.exp: Ditto. So this isn't a regression, but I can also understand the desire to fix this fairly significant performance issue. IMHO this is extremely safe -- if rejecting two operand permutes here were to result in a failure I would argue that we've got a clear bug in our other permutation support that could be triggered without this change. So I'm OK with it as well. I think giving folks through the end of the week to chime in with potential objections would be wise though. jeff
[PATCH] Fortran: do not evaluate arguments of MAXVAL/MINVAL too often [PR118613]
Dear all, while looking at details of a related but slightly different PR, I found that we did evaluate the arguments to MINLOC/MAXLOC too often in the inlined version. The attached patch creates temporaries for array elements where needed, and ensures that each array element is only touched once. This required a minor adjustment for the rank-1 algorithm, which is documented. Regtested on x86_64-pc-linux-gnu. OK for mainline? Thanks, Harald From d2a52256b3e4817e16a5d222c2fecd7bc66e5613 Mon Sep 17 00:00:00 2001 From: Harald Anlauf Date: Wed, 22 Jan 2025 22:44:39 +0100 Subject: [PATCH] Fortran: do not evaluate arguments of MAXVAL/MINVAL too often [PR118613] PR fortran/118613 gcc/fortran/ChangeLog: * trans-intrinsic.cc (gfc_conv_intrinsic_minmaxval): Adjust algorithm for inlined version of MINLOC and MAXLOC so that arguments are only evaluted once, and create temporaries where necessary. Document change of algorithm. gcc/testsuite/ChangeLog: * gfortran.dg/maxval_arg_eval_count.f90: New test. --- gcc/fortran/trans-intrinsic.cc| 37 +++- .../gfortran.dg/maxval_arg_eval_count.f90 | 88 +++ 2 files changed, 121 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/maxval_arg_eval_count.f90 diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc index afbec5b2752..51237d0d3be 100644 --- a/gcc/fortran/trans-intrinsic.cc +++ b/gcc/fortran/trans-intrinsic.cc @@ -6409,8 +6409,16 @@ gfc_conv_intrinsic_findloc (gfc_se *se, gfc_expr *expr) nonempty = false; S = from; while (S <= to) { - if (mask[S]) { nonempty = true; if (a[S] <= limit) goto lab; } - S++; + if (mask[S]) { + nonempty = true; + if (a[S] <= limit) { + limit = a[S]; + S++; + goto lab; + } + else + S++; + } } limit = nonempty ? NaN : huge (limit); lab: @@ -6419,7 +6427,15 @@ gfc_conv_intrinsic_findloc (gfc_se *se, gfc_expr *expr) at runtime whether array is nonempty or not, rank 1: limit = Infinity; S = from; - while (S <= to) { if (a[S] <= limit) goto lab; S++; } + while (S <= to) { + if (a[S] <= limit) { + limit = a[S]; + S++; + goto lab; + } + else + S++; + } limit = (from <= to) ? NaN : huge (limit); lab: while (S <= to) { limit = min (a[S], limit); S++; } @@ -6710,6 +6726,7 @@ gfc_conv_intrinsic_minmaxval (gfc_se * se, gfc_expr * expr, enum tree_code op) gfc_copy_loopinfo_to_se (&arrayse, &loop); arrayse.ss = arrayss; gfc_conv_expr_val (&arrayse, arrayexpr); + arrayse.expr = gfc_evaluate_now (arrayse.expr, &arrayse.pre); gfc_add_block_to_block (&block, &arrayse.pre); gfc_init_block (&block2); @@ -6722,7 +6739,18 @@ gfc_conv_intrinsic_minmaxval (gfc_se * se, gfc_expr * expr, enum tree_code op) tmp = fold_build2_loc (input_location, op == GT_EXPR ? GE_EXPR : LE_EXPR, logical_type_node, arrayse.expr, limit); if (lab) - ifbody = build1_v (GOTO_EXPR, lab); + { + stmtblock_t ifblock; + tree inc_loop; + inc_loop = fold_build2_loc (input_location, PLUS_EXPR, + TREE_TYPE (loop.loopvar[0]), + loop.loopvar[0], gfc_index_one_node); + gfc_init_block (&ifblock); + gfc_add_modify (&ifblock, limit, arrayse.expr); + gfc_add_modify (&ifblock, loop.loopvar[0], inc_loop); + gfc_add_expr_to_block (&ifblock, build1_v (GOTO_EXPR, lab)); + ifbody = gfc_finish_block (&ifblock); + } else { stmtblock_t ifblock; @@ -6816,6 +6844,7 @@ gfc_conv_intrinsic_minmaxval (gfc_se * se, gfc_expr * expr, enum tree_code op) gfc_copy_loopinfo_to_se (&arrayse, &loop); arrayse.ss = arrayss; gfc_conv_expr_val (&arrayse, arrayexpr); + arrayse.expr = gfc_evaluate_now (arrayse.expr, &arrayse.pre); gfc_add_block_to_block (&block, &arrayse.pre); /* MIN_EXPR/MAX_EXPR has unspecified behavior with NaNs or diff --git a/gcc/testsuite/gfortran.dg/maxval_arg_eval_count.f90 b/gcc/testsuite/gfortran.dg/maxval_arg_eval_count.f90 new file mode 100644 index 000..1c1a04819a0 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/maxval_arg_eval_count.f90 @@ -0,0 +1,88 @@ +! { dg-do run } +! +! PR fortran/118613 - check argument evaluation count of MAXVAL + +program p + implicit none + integer, parameter :: k = 2 + integer :: n + integer :: i1(k*k), i2(k,k), mm + real:: a1(k*k), a2(k,k), mx + complex :: c1(k*k), c2(k,k) + logical :: m1(k*k), m2(k,k) + + ! prepare mask for masked variants + m1 = .true. + m2 = .true. + i1 = 0 + i2 = 0 + a1 = 0. + a2 = 0. + c1 = 0. + c2 = 0. + + ! integer + n = 0 + mm = maxval (h(i1)) + if (n /= k*k .or. mm /= 0) stop 1 + n = 0 + mm = maxval (h(i2)) + if (n /= k*k .or. mm /= 0) stop 2 + n = 0 + mm = maxval (h(i1),m1) + if (n /= k*k .or. mm /= 0) stop 3 + n = 0 + mm = maxval (h(i2),m2) + if (n /= k*k .or. mm /= 0) stop 4 + + ! real + n = 0 + mx = maxval (f(a1)) + if (n /= k*k .or. mx /= 0) st
Re: [PATCH] i386: Append -march=x86-64-v3 to AVX10.2/512 VNNI testcases
On Wed, Jan 22, 2025 at 11:13 AM Haochen Jiang wrote: > > Hi all, > > These two testcases are misses on previous addition for > -march=x86-64-v3 to silence warning for -march=native tests. > > Ok for trunk? Ok. > > Thx, > Haochen > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/vnniint16-auto-vectorize-4.c: Append > -march=x86-64-v3. > * gcc.target/i386/vnniint8-auto-vectorize-4.c: Ditto. > --- > gcc/testsuite/gcc.target/i386/vnniint16-auto-vectorize-4.c | 2 +- > gcc/testsuite/gcc.target/i386/vnniint8-auto-vectorize-4.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/gcc/testsuite/gcc.target/i386/vnniint16-auto-vectorize-4.c > b/gcc/testsuite/gcc.target/i386/vnniint16-auto-vectorize-4.c > index beaab189406..1bd1400d1a4 100644 > --- a/gcc/testsuite/gcc.target/i386/vnniint16-auto-vectorize-4.c > +++ b/gcc/testsuite/gcc.target/i386/vnniint16-auto-vectorize-4.c > @@ -1,5 +1,5 @@ > /* { dg-do run } */ > -/* { dg-options "-O2 -mavx10.2-512" } */ > +/* { dg-options "-O2 -march=x86-64-v3 -mavx10.2-512" } */ > /* { dg-require-effective-target avx10_2_512 } */ > > #define N 512 > diff --git a/gcc/testsuite/gcc.target/i386/vnniint8-auto-vectorize-4.c > b/gcc/testsuite/gcc.target/i386/vnniint8-auto-vectorize-4.c > index 70cd80c225b..dafab344d1d 100644 > --- a/gcc/testsuite/gcc.target/i386/vnniint8-auto-vectorize-4.c > +++ b/gcc/testsuite/gcc.target/i386/vnniint8-auto-vectorize-4.c > @@ -1,5 +1,5 @@ > /* { dg-do run } */ > -/* { dg-options "-O2 -mavx10.2-512" } */ > +/* { dg-options "-O2 -march=x86-64-v3 -mavx10.2-512" } */ > /* { dg-require-effective-target avx10_2_512 } */ > > #define N 512 > -- > 2.31.1 > -- BR, Hongtao
Re: [PATCH v3 2/6] c++: Fix mangling of otherwise unattached class-scope lambdas [PR118245]
On 1/6/25 7:21 AM, Nathaniel Shead wrote: Something like this should probably be backported to GCC 14 too, since my change in r14-9232-g3685fae23bb008 inadvertantly caused ICEs that this fixes. But without the previous patch this patch will cause ABI changes, and I'm not sure how easily it would be to divorce those changes from the fix here. I suppose probably the true issue is that r14-9232 inadvertantly changed ABI for lambdas in base classes, and we should just revert the parser.cc changes for 14.3? (And accept that it'll regress those modules tests.) I suppose so. -- >8 -- This is a step closer to implementing the suggested changes for https://github.com/itanium-cxx-abi/cxx-abi/pull/85. Most lambdas defined within a class should have an extra scope of that class so that uses across different TUs are properly merged by the linker. This also needs to happen during template instantiation. While I was working on this I found some other cases where the mangling of lambdas was incorrect and causing issues, notably the testcase lambda-ctx3.C which currently emits the same mangling for the base class and member lambdas, causing mysterious assembler errors since r14-9232. This is also the root cause of PR c++/118245. One notable case not handled either here or in the ABI is what is supposed to happen with lambdas declared in alias templates; see lambda-ctx4.C. I believe that by the C++ standard, such lambdas should also dedup across TUs, but this isn't currently implemented (for class-scope or not). I wasn't able to work out how to fix the mangling logic for this case easily so I've just excluded alias templates from the class-scope mangling rules in template instantiation. I'm skeptical of the instantiate_template change for other member templates as well, since the set of instantiations of such member templates might vary between translation units, so the lambda should be in the scope of that member template instantiation rather than just the class. And I don't see any other member templates in the testcases? Like, for struct S { template decltype([]{ return I; }) f() { return {}; }; }; void f(decltype(S().f<24>())*) {} void f(decltype(S().f<42>())*) {} how are these lambdas mangled? Before this patch, the two lambdas have arbitrary discriminators in S but they are treated as TU-local so it doesn't matter. After this patch, they get the same number and are no longer TU-local, and assembly fails. As with the member alias template example, I think it would be coherent to say the lambdas are TU-local because they're between a template parameter scope and a class/function/initializer, as a clarification to https://eel.is/c++draft/basic#link-15.2 . Or we need to mangle them in the scope of the member template instantiation even though they aren't within the body of the function. Interestingly, clang mangles them in the scope of S::f, without any template arguments. That seems to have the same problem as just using S. And they aren't treated as TU-local, so the arbitrary discriminators are an ABI problem. Jason
Re: [PATCH v3 4/6] c++: Update mangling of lambdas in expressions
On 1/6/25 7:23 AM, Nathaniel Shead wrote: https://github.com/itanium-cxx-abi/cxx-abi/pull/85 clarifies that mangling a lambda expression should use 'L' rather than "tl". This only affects C++20 (and later) so no ABI flag is given. OK. gcc/cp/ChangeLog: * mangle.cc (write_expression): Update mangling for lambdas. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/lambda-generic-mangle1.C: Update mangling. * g++.dg/cpp2a/lambda-generic-mangle1a.C: Likewise. Signed-off-by: Nathaniel Shead --- gcc/cp/mangle.cc | 2 +- gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1.C | 2 +- gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1a.C | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/cp/mangle.cc b/gcc/cp/mangle.cc index 9d457e2a2f3..e2e8fb2c71b 100644 --- a/gcc/cp/mangle.cc +++ b/gcc/cp/mangle.cc @@ -3769,7 +3769,7 @@ write_expression (tree expr) equivalent. So just use the closure type mangling. */ - write_string ("tl"); + write_char ('L'); write_type (LAMBDA_EXPR_CLOSURE (expr)); write_char ('E'); } diff --git a/gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1.C b/gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1.C index 0051307f53d..306959a4f9f 100644 --- a/gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1.C +++ b/gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1.C @@ -6,4 +6,4 @@ struct C { void f(decltype([](T, auto) { return 0; })) {} }; void g() { C().f({}); } -// { dg-final { scan-assembler "_ZN1C1fIiEEvDTtlNS_UlT_TL0__E_EEE" } } +// { dg-final { scan-assembler "_ZN1C1fIiEEvDTLNS_UlT_TL0__E_EEE" } } diff --git a/gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1a.C b/gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1a.C index dc7b0125631..b7bd4ecdd46 100644 --- a/gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1a.C +++ b/gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1a.C @@ -7,4 +7,4 @@ struct C { void f(decltype([](T, auto) { return 0; })) {} }; void g() { C().f({}); } -// { dg-final { scan-assembler "_ZN1C1fIiEEvDTtlNS_UlT_T_E_EEE" } } +// { dg-final { scan-assembler "_ZN1C1fIiEEvDTLNS_UlT_T_E_EEE" } }
Re: [PATCH] Fortran: Added support for locality specs in DO CONCURRENT (Fortran 2018/23)
The first error message in my previous email led me to the following constraint: “*C1130* A *variable-name *that appears in a LOCAL or LOCAL_INIT *locality-spec *shall not have the ALLOCATABLE, INTENT (IN), or OPTIONAL attribute, shall not be of finalizable type, shall not have an allocatable ultimate component,...” My first thought was, "Holy guacamole. That seems like such a severe limitation that the feature seems almost useless." Fortuitously, however, it turns out that the code I sent a little while ago was missing one important feature from my intended use case: associate. As shown below, using associate eliminates the first error, but I'm still confused by the remaining error message. Are locality specifiers actually supported yet? Damian % cat locality.f90 program main implicit none integer pair integer :: mini_batch_size=1 real, allocatable, dimension(:,:) :: a, dcdb allocate(a(1,1)) allocate(dcdb(1,1)) associate(a_ => a, dcdb_ => dcdb) do concurrent (pair = 1:mini_batch_size) local(a_) reduce(+: dcdb_) a_ = 0. dcdb_ = 0. end do end associate end program % gfortran locality.f90 *locality.f90:12:71:* 12 | do concurrent (pair = 1:mini_batch_size) local(a_) reduce(+: dcdb_) | *1* *Error:* Sorry, LOCAL and LOCAL_INIT are not yet supported for ‘*do concurrent*’ constructs at *(1)* % gfortran --version GNU Fortran (GCC) 15.0.1 20250119 (experimental) Copyright (C) 2025 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
[PATCH] LoongArch: Fix ICE caused by illegal calls to builtin functions [PR118561].
PR target/118561 gcc/ChangeLog: * config/loongarch/loongarch-builtins.cc (loongarch_expand_builtin_lsx_test_branch): NULL_RTX will not be returned when an error is detected. (loongarch_expand_builtin): Likewise. gcc/testsuite/ChangeLog: * gcc.target/loongarch/pr118561.c: New test. --- gcc/config/loongarch/loongarch-builtins.cc| 7 +-- gcc/testsuite/gcc.target/loongarch/pr118561.c | 9 + 2 files changed, 14 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118561.c diff --git a/gcc/config/loongarch/loongarch-builtins.cc b/gcc/config/loongarch/loongarch-builtins.cc index 92d995a916a..1849b35357c 100644 --- a/gcc/config/loongarch/loongarch-builtins.cc +++ b/gcc/config/loongarch/loongarch-builtins.cc @@ -2996,7 +2996,10 @@ loongarch_expand_builtin_lsx_test_branch (enum insn_code icode, tree exp) ops[1].value = force_reg (ops[1].mode, ops[1].value); if ((cbranch = maybe_gen_insn (icode, 3, ops)) == NULL_RTX) -error ("failed to expand built-in function"); +{ + error ("failed to expand built-in function"); + return const0_rtx; +} cmp_result = gen_reg_rtx (SImode); @@ -3036,7 +3039,7 @@ loongarch_expand_builtin (tree exp, rtx target, rtx subtarget ATTRIBUTE_UNUSED, { error_at (EXPR_LOCATION (exp), "built-in function %qD is not enabled", fndecl); - return target; + return target ? target : const0_rtx; } switch (d->builtin_type) diff --git a/gcc/testsuite/gcc.target/loongarch/pr118561.c b/gcc/testsuite/gcc.target/loongarch/pr118561.c new file mode 100644 index 000..81a776eada3 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/pr118561.c @@ -0,0 +1,9 @@ +/* PR target/118561: ICE with -mfpu=none */ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=loongarch64 -mfpu=none" } */ + +int +test (void) +{ + return __builtin_loongarch_movfcsr2gr (0); /* { dg-error "built-in function '__builtin_loongarch_movfcsr2gr' is not enabled" } */ +} -- 2.34.1
Re: [PATCH] LoongArch: Fix invalid subregs in xorsign [PR118501]
在 2025/1/23 上午11:36, Xi Ruoyao 写道: On Thu, 2025-01-23 at 11:21 +0800, Lulu Cheng wrote: 在 2025/1/22 下午9:26, Xi Ruoyao 写道: The test case added in r15-7073 now triggers an ICE, indicating we need the same fix as AArch64. gcc/ChangeLog: PR target/118501 * config/loongarch/loongarch.md (@xorsign3): Use force_lowpart_subreg. --- Bootstrapped and regtested on loongarch64-linux-gnu, ok for trunk? LGTM! Here we try to use force_lowpart_subreg when can_create_pseudo_p () is true, right? As this is a define_expand, can_create_pseudo_p should be just true (it turns to false when the reload pass starts, and reload is far after expand). Um, that's not what I meant. What I meant is that in future code implementation, if can_create_pseudo_p is satisfied, we will not use lowpart_subreg but use force_lowpart_subreg. I'm pushing the patch into master now. gcc/config/loongarch/loongarch.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 0325994ebd6..04a9a79d548 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -1343,8 +1343,8 @@ (define_expand "@xorsign3" machine_mode lsx_mode = mode == SFmode ? V4SFmode : V2DFmode; rtx tmp = gen_reg_rtx (lsx_mode); - rtx op1 = lowpart_subreg (lsx_mode, operands[1], mode); - rtx op2 = lowpart_subreg (lsx_mode, operands[2], mode); + rtx op1 = force_lowpart_subreg (lsx_mode, operands[1], mode); + rtx op2 = force_lowpart_subreg (lsx_mode, operands[2], mode); emit_insn (gen_xorsign3 (lsx_mode, tmp, op1, op2)); emit_move_insn (operands[0], lowpart_subreg (mode, tmp, lsx_mode));
[PATCH v2] libcpp: Fix incorrect line numbers in large files [PR108900]
From: Yash Shinde This patch addresses an issue in the C preprocessor where incorrect line number information is generated when processing files with a large number of lines. The problem arises from improper handling of location intervals in the line map, particularly when locations exceed LINE_MAP_MAX_LOCATION_WITH_PACKED_RANGES. By ensuring that the highest location is not decremented if it would move to a different ordinary map, this fix resolves the line number discrepancies observed in certain test cases. This change improves the accuracy of line number reporting, benefiting users relying on precise code coverage and debugging information. Signed-off-by: Jeremy Bettis Signed-off-by: Yash Shinde --- libcpp/files.cc | 8 1 file changed, 8 insertions(+) diff --git a/libcpp/files.cc b/libcpp/files.cc index 1ed19ca..3e6ca119ad5 100644 --- a/libcpp/files.cc +++ b/libcpp/files.cc @@ -1046,6 +1046,14 @@ _cpp_stack_file (cpp_reader *pfile, _cpp_file *file, include_type type, && type < IT_DIRECTIVE_HWM && (pfile->line_table->highest_location != LINE_MAP_MAX_LOCATION - 1)); + + if (decrement && LINEMAPS_ORDINARY_USED (pfile->line_table)) +{ + const line_map_ordinary *map = LINEMAPS_LAST_ORDINARY_MAP (pfile->line_table); + if (map && map->start_location == pfile->line_table->highest_location) + decrement = false; +} + if (decrement) pfile->line_table->highest_location--; -- 2.43.0
Re: [PATCH] LoongArch: Fix invalid subregs in xorsign [PR118501]
在 2025/1/22 下午9:26, Xi Ruoyao 写道: The test case added in r15-7073 now triggers an ICE, indicating we need the same fix as AArch64. gcc/ChangeLog: PR target/118501 * config/loongarch/loongarch.md (@xorsign3): Use force_lowpart_subreg. --- Bootstrapped and regtested on loongarch64-linux-gnu, ok for trunk? LGTM! Here we try to use force_lowpart_subreg when can_create_pseudo_p () is true, right? gcc/config/loongarch/loongarch.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 0325994ebd6..04a9a79d548 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -1343,8 +1343,8 @@ (define_expand "@xorsign3" machine_mode lsx_mode = mode == SFmode ? V4SFmode : V2DFmode; rtx tmp = gen_reg_rtx (lsx_mode); - rtx op1 = lowpart_subreg (lsx_mode, operands[1], mode); - rtx op2 = lowpart_subreg (lsx_mode, operands[2], mode); + rtx op1 = force_lowpart_subreg (lsx_mode, operands[1], mode); + rtx op2 = force_lowpart_subreg (lsx_mode, operands[2], mode); emit_insn (gen_xorsign3 (lsx_mode, tmp, op1, op2)); emit_move_insn (operands[0], lowpart_subreg (mode, tmp, lsx_mode));
Re: [PATCH] LoongArch: Fix invalid subregs in xorsign [PR118501]
On Thu, 2025-01-23 at 11:21 +0800, Lulu Cheng wrote: > > 在 2025/1/22 下午9:26, Xi Ruoyao 写道: > > The test case added in r15-7073 now triggers an ICE, indicating we need > > the same fix as AArch64. > > > > gcc/ChangeLog: > > > > PR target/118501 > > * config/loongarch/loongarch.md (@xorsign3): Use > > force_lowpart_subreg. > > --- > > > > Bootstrapped and regtested on loongarch64-linux-gnu, ok for trunk? > > LGTM! > > Here we try to use force_lowpart_subreg when can_create_pseudo_p () is > true, right? As this is a define_expand, can_create_pseudo_p should be just true (it turns to false when the reload pass starts, and reload is far after expand). I'm pushing the patch into master now. > > gcc/config/loongarch/loongarch.md | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/gcc/config/loongarch/loongarch.md > > b/gcc/config/loongarch/loongarch.md > > index 0325994ebd6..04a9a79d548 100644 > > --- a/gcc/config/loongarch/loongarch.md > > +++ b/gcc/config/loongarch/loongarch.md > > @@ -1343,8 +1343,8 @@ (define_expand "@xorsign3" > > machine_mode lsx_mode > > = mode == SFmode ? V4SFmode : V2DFmode; > > rtx tmp = gen_reg_rtx (lsx_mode); > > - rtx op1 = lowpart_subreg (lsx_mode, operands[1], mode); > > - rtx op2 = lowpart_subreg (lsx_mode, operands[2], mode); > > + rtx op1 = force_lowpart_subreg (lsx_mode, operands[1], mode); > > + rtx op2 = force_lowpart_subreg (lsx_mode, operands[2], mode); > > emit_insn (gen_xorsign3 (lsx_mode, tmp, op1, op2)); > > emit_move_insn (operands[0], > > lowpart_subreg (mode, tmp, lsx_mode)); > -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] Fortran: Added support for locality specs in DO CONCURRENT (Fortran 2018/23)
I recently built the master branch of Iain's fork of gcc in order to test the support for locality specifiers. I have verified that the branch I built contains the commit mentioned in this email thread 20b8500cfa522ebe0fcf756d5b32816da7f904dd. The code below isn't designed to do anything useful -- just to make sure the syntax works. Could someone tell me why the second error message indicates that LOCAL is not supported yet? Also, the first error message is really interesting because if I'm really not allowed to have anything allocatable inside LOCAL(), then I'm going to have to go back to the drawing board regarding my intended use case. Any advice? Damian % cat locality.f90 program main implicit none integer pair integer :: mini_batch_size=1 real, allocatable, dimension(:,:) :: a, dcdb allocate(a(1,1)) allocate(dcdb(1,1)) do concurrent (pair = 1:mini_batch_size) local(a) reduce(+: dcdb) a = 0. dcdb = 0. end do end program % gfortran locality.f90 *locality.f90:8:50:* 8 | do concurrent (pair = 1:mini_batch_size) local(a) reduce(+: dcdb) | *1* *Error:* ALLOCATABLE attribute not permitted for ‘*a*’ in LOCAL locality-spec at *(1)* *locality.f90:8:67:* 8 | do concurrent (pair = 1:mini_batch_size) local(a) reduce(+: dcdb) | *1* *Error:* Sorry, LOCAL and LOCAL_INIT are not yet supported for ‘*do concurrent*’ constructs at *(1)* gfortran --version GNU Fortran (GCC) 15.0.1 20250119 (experimental) Copyright (C) 2025 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. On Mon, Jan 13, 2025 at 5:41 PM Jerry D wrote: > Committed as: > > commit 20b8500cfa522ebe0fcf756d5b32816da7f904dd (HEAD -> master, > origin/master, origin/HEAD) > Author: Anuj Mohite > Date: Mon Jan 13 16:28:57 2025 -0800 > > Fortran: Add LOCALITY support for DO_CONCURRENT > > This patch provided by Anuj Mohite as part of the GSoC > project. > It is modified slightly by Jerry DeLisle for minor formatting. > The patch provides front-end parsing of the LOCALITY specs in > DO_CONCURRENT and adds numerous test cases. > > gcc/fortran/ChangeLog: > > * dump-parse-tree.cc (show_code_node): Updated to use > c->ext.concur.forall_iterator instead of > c->ext.forall_iterator. > * frontend-passes.cc (index_interchange): Updated to > use c->ext.concur.forall_iterator instead of > c->ext.forall_iterator. > (gfc_code_walker): Likewise. > * gfortran.h (enum locality_type): Added new enum for > locality types > in DO CONCURRENT constructs. > * match.cc (match_simple_forall): Updated to use > new_st.ext.concur.forall_iterator instead of > new_st.ext.forall_iterator. > (gfc_match_forall): Likewise. > (gfc_match_do): Implemented support for matching DO > CONCURRENT locality > specifiers (LOCAL, LOCAL_INIT, SHARED, DEFAULT(NONE), and > REDUCE). > * parse.cc (parse_do_block): Updated to use > new_st.ext.concur.forall_iterator instead of > new_st.ext.forall_iterator. > * resolve.cc (struct check_default_none_data): Added struct > check_default_none_data. > (do_concur_locality_specs_f2023): New function to check > compliance > with F2023's C1133 constraint for DO CONCURRENT. > (check_default_none_expr): New function to check DEFAULT(NONE) > compliance. > (resolve_locality_spec): New function to resolve locality > specs. > (gfc_count_forall_iterators): Updated to use > code->ext.concur.forall_iterator. > (gfc_resolve_forall): Updated to use > code->ext.concur.forall_iterator. > * st.cc (gfc_free_statement): Updated to free locality > specifications > and use p->ext.concur.forall_iterator. > * trans-stmt.cc (gfc_trans_forall_1): Updated to use > code->ext.concur.forall_iterator. > > gcc/testsuite/ChangeLog: > > * gfortran.dg/do_concurrent_10.f90: New test. > * gfortran.dg/do_concurrent_8_f2018.f90: New test. > * gfortran.dg/do_concurrent_8_f2023.f90: New test. > * gfortran.dg/do_concurrent_9.f90: New test. > * gfortran.dg/do_concurrent_all_clauses.f90: New test. > * gfortran.dg/do_concurrent_basic.f90: New test. > * gfortran.dg/do_concurrent_constraints.f90: New test. > * gfortran.dg/do_concurrent_local_init.f90: New test. > * gfortran.dg/do_concurrent_locality_specs.f90: New test. > * gfortran.dg/do_concurrent_multiple_reduce.f90: New test. >
[PATCH v2] [ifcombine] avoid dropping tree_could_trap_p [PR118514]
On Jan 22, 2025, Alexandre Oliva wrote: > I have another patch coming up that doesn't raise concerns for me, so > I'll hold off from installing the above, in case you also prefer the > other one. Unlike other access patterns, BIT_FIELD_REFs aren't regarded as possibly-trapping out of referencing out-of-bounds bits. So, if decode_field_reference finds a load that could trap, but whose inner object can't, bail out if it accesses past the inner object's size. This may drop some optimizations in VLAs, that could be safe if we managed to reuse the same pre-existing load for both compares, but those get optimized later (as in the new testcase). The cases of most interest are those that merge separate fields, and they necessarily involve new BIT_FIELD_REFs loads. Regstrapped on x86_64-linux-gnu. Ok to install instead of the other one? for gcc/ChangeLog PR tree-optimization/118514 * gimple-fold.cc (decode_field_reference): Refuse to consider BIT_FIELD_REF of non-trapping object if the original load could trap for being out-of-bounds. (make_bit_field_ref): Check that replacement loads could trap as much as the original loads. (fold_truth_andor_for_ifcombine): Rearrange testing of reversep, and note that signbit needs not be concerned with it. Check that combinable loads have the same trapping status. * tree-eh.cc (bit_field_ref_in_bounds_p): New. (tree_could_trap_p): Call it on DECLs. * tree-eh.h (bit_field_ref_in_bounds_p): Declare. for gcc/testsuite/ChangeLog PR tree-optimization/118514 * gcc.dg/field-merge-23.c: New. --- gcc/gimple-fold.cc| 27 +-- gcc/testsuite/gcc.dg/field-merge-23.c | 19 +++ gcc/tree-eh.cc| 30 +- gcc/tree-eh.h |1 + 4 files changed, 70 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/field-merge-23.c diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc index da3f505c3fcac..cd9d350349186 100644 --- a/gcc/gimple-fold.cc +++ b/gcc/gimple-fold.cc @@ -7686,10 +7686,14 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT *pbitsize, || bs <= shiftrt || offset != 0 || TREE_CODE (inner) == PLACEHOLDER_EXPR - /* Reject out-of-bound accesses (PR79731). */ - || (! AGGREGATE_TYPE_P (TREE_TYPE (inner)) - && compare_tree_int (TYPE_SIZE (TREE_TYPE (inner)), - bp + bs) < 0) + /* Reject out-of-bound accesses (PR79731). BIT_FIELD_REFs aren't flagged +as tree_could_trap_p just because of out-of-range bits, so don't even +try to optimize loads that could trap if they access out-of-range bits +of an object that could not trap (PR118514). */ + || ((! AGGREGATE_TYPE_P (TREE_TYPE (inner)) + || (load && tree_could_trap_p (gimple_assign_rhs1 (load)) + && !tree_could_trap_p (inner))) + && !bit_field_ref_in_bounds_p (inner, bs, bp)) || (INTEGRAL_TYPE_P (TREE_TYPE (inner)) && !type_has_mode_precision_p (TREE_TYPE (inner return NULL_TREE; @@ -7859,6 +7863,11 @@ make_bit_field_load (location_t loc, tree inner, tree orig_inner, tree type, gimple *new_stmt = gsi_stmt (i); if (gimple_has_mem_ops (new_stmt)) gimple_set_vuse (new_stmt, reaching_vuse); + gcc_checking_assert (! (gimple_assign_load_p (point) + && gimple_assign_load_p (new_stmt)) + || (tree_could_trap_p (gimple_assign_rhs1 (point)) + == tree_could_trap_p (gimple_assign_rhs1 +(new_stmt; } gimple_stmt_iterator gsi = gsi_for_stmt (point); @@ -8223,8 +8232,12 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree truth_type, std::swap (rl_loc, rr_loc); } + /* Check that the loads that we're trying to combine have the same vuse and + same trapping status. */ if ((ll_load && rl_load) - ? gimple_vuse (ll_load) != gimple_vuse (rl_load) + ? (gimple_vuse (ll_load) != gimple_vuse (rl_load) +|| (tree_could_trap_p (gimple_assign_rhs1 (ll_load)) +!= tree_could_trap_p (gimple_assign_rhs1 (rl_load : (!ll_load != !rl_load)) return 0; @@ -8277,7 +8290,9 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree truth_type, else if (lr_reversep != rr_reversep || ! operand_equal_p (lr_inner, rr_inner, 0) || ((lr_load && rr_load) - ? gimple_vuse (lr_load) != gimple_vuse (rr_load) + ? (gimple_vuse (lr_load) != gimple_vuse (rr_load) + || (tree_could_trap_p (gimple_assign_rhs1 (lr_load)) + != tree_could_trap_p (gimple_assign_rhs1 (rr_load : (!lr_load != !rr_load))) re
Re: [patch,avr] ad PR117726: Tweak 32-bit logical shifts of 25...30 for -Oz
ср, 22 янв. 2025 г. в 23:53, Georg-Johann Lay : > > As it turns out, logical 32-bit shifts with an offset of 25..30 can > be performed in 7 instructions or less. This beats the 7 instruc- > tions required for the default code of a shift loop. > Plus, with zero overhead, these cases can be 3-operand. > > This is only relevant for -Oz because with -Os, 3op shifts are > split with -msplit-bit-shift (which is not performed with -Oz). > > Passes without new regressions. Ok for trunk? Ok. Please apply. Denis
[PATCH] [ifcombine] improve reverse checking and operand swapping
On Jan 22, 2025, Alexandre Oliva wrote: > I have another patch coming up that doesn't raise concerns for me, so > I'll hold off from installing the above, in case you also prefer the > other one. And here's an unrelated bit that came to mind while working on this, but that I split out before posting. Don't reject an ifcombine field-merging opportunity just because the left-hand operands aren't both reversed, if the second compare needs to be swapped for operands to match. Also mention that reversep does NOT affect the turning of range tests into bit tests. Regstrapped on x86_64-linux-gnu. Ok to install? for gcc/ChangeLog * gimple-fold.cc (fold_truth_andor_for_ifcombine): Document reversep's absence of effects on range tests. Don't reject reversep mismatches before trying compare swapping. --- gcc/gimple-fold.cc | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc index 3c971a29ef045..da3f505c3fcac 100644 --- a/gcc/gimple-fold.cc +++ b/gcc/gimple-fold.cc @@ -8090,8 +8090,9 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree truth_type, return 0; } - /* Prepare to turn compares of signed quantities with zero into - sign-bit tests. */ + /* Prepare to turn compares of signed quantities with zero into sign-bit + tests. We need not worry about *_reversep here for these compare + rewrites: loads will have already been reversed before compares. */ bool lsignbit = false, rsignbit = false; if ((lcode == LT_EXPR || lcode == GE_EXPR) && integer_zerop (lr_arg) @@ -8198,10 +8199,11 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree truth_type, the rhs's. If one is a load and the other isn't, we have to be conservative and avoid the optimization, otherwise we could get SRAed fields wrong. */ - if (volatilep || ll_reversep != rl_reversep) + if (volatilep) return 0; - if (! operand_equal_p (ll_inner, rl_inner, 0)) + if (ll_reversep != rl_reversep + || ! operand_equal_p (ll_inner, rl_inner, 0)) { /* Try swapping the operands. */ if (ll_reversep != rr_reversep -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer More tolerance and less prejudice are key for inclusion and diversity Excluding neuro-others for not behaving ""normal"" is *not* inclusive
[PATCH v2 3/4] RISC-V: Fix incorrect code gen for scalar signed SAT_SUB [PR117688]
From: Pan Li This patch would like to fix the wroing code generation for the scalar signed SAT_SUB. The input can be QI/HI/SI/DI while the alu like sub can only work on Xmode. Unfortunately we don't have sub/add for non-Xmode like QImode in scalar, thus we need to sign extend to Xmode to ensure we have the correct value before ALU like sub. The gen_lowpart will generate something like lbu which has all zero for highest bits. For example, when 0xff(-1 for QImode) sub 0x1(1 for QImode), we actually want to -1 - 1 = -2, but if there is no sign extend like lbu, we will get 0xff - 1 = 0xfe which is incorrect. Thus, we have to sign extend 0xff(Qmode) to 0x(assume XImode is DImode) before sub in Xmode. The below test suites are passed for this patch. * The rv64gcv fully regression test. PR target/117688 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_expand_sssub): Leverage the helper riscv_extend_to_xmode_reg with SIGN_EXTEND. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr117688.h: Add test helper macro. * gcc.target/riscv/pr117688-sub-run-1-s16.c: New test. * gcc.target/riscv/pr117688-sub-run-1-s32.c: New test. * gcc.target/riscv/pr117688-sub-run-1-s64.c: New test. * gcc.target/riscv/pr117688-sub-run-1-s8.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/riscv.cc | 4 ++-- .../gcc.target/riscv/pr117688-sub-run-1-s16.c | 6 ++ .../gcc.target/riscv/pr117688-sub-run-1-s32.c | 6 ++ .../gcc.target/riscv/pr117688-sub-run-1-s64.c | 6 ++ .../gcc.target/riscv/pr117688-sub-run-1-s8.c | 6 ++ gcc/testsuite/gcc.target/riscv/pr117688.h | 21 +++ 6 files changed, 47 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s16.c create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s32.c create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s64.c create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s8.c diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 59148bfab91..4999044f198 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -12903,8 +12903,8 @@ riscv_expand_sssub (rtx dest, rtx x, rtx y) machine_mode mode = GET_MODE (dest); unsigned bitsize = GET_MODE_BITSIZE (mode).to_constant (); rtx shift_bits = GEN_INT (bitsize - 1); - rtx xmode_x = gen_lowpart (Xmode, x); - rtx xmode_y = gen_lowpart (Xmode, y); + rtx xmode_x = riscv_extend_to_xmode_reg (x, mode, SIGN_EXTEND); + rtx xmode_y = riscv_extend_to_xmode_reg (y, mode, SIGN_EXTEND); rtx xmode_minus = gen_reg_rtx (Xmode); rtx xmode_xor_0 = gen_reg_rtx (Xmode); rtx xmode_xor_1 = gen_reg_rtx (Xmode); diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s16.c b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s16.c new file mode 100644 index 000..7b375bb6c85 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s16.c @@ -0,0 +1,6 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99" } */ + +#include "pr117688.h" + +DEFINE_SIGNED_SAT_SUB_RUN(int16_t, INT16_MIN, INT16_MAX) diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s32.c b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s32.c new file mode 100644 index 000..ba0e8fc8ea5 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s32.c @@ -0,0 +1,6 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99" } */ + +#include "pr117688.h" + +DEFINE_SIGNED_SAT_SUB_RUN(int32_t, INT32_MIN, INT32_MAX) diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s64.c b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s64.c new file mode 100644 index 000..c24c549af30 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s64.c @@ -0,0 +1,6 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99" } */ + +#include "pr117688.h" + +DEFINE_SIGNED_SAT_SUB_RUN(int64_t, INT64_MIN, INT64_MAX) diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s8.c b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s8.c new file mode 100644 index 000..67f9df179a1 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s8.c @@ -0,0 +1,6 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99" } */ + +#include "pr117688.h" + +DEFINE_SIGNED_SAT_SUB_RUN(int8_t, INT8_MIN, INT8_MAX) diff --git a/gcc/testsuite/gcc.target/riscv/pr117688.h b/gcc/testsuite/gcc.target/riscv/pr117688.h index 1013a8a8012..3b734ce62ed 100644 --- a/gcc/testsuite/gcc.target/riscv/pr117688.h +++ b/gcc/testsuite/gcc.target/riscv/pr117688.h @@ -24,4 +24,25 @@ return 0; \ } +#define DEFINE_SIGNED_SAT_SUB_RUN(T, MIN, MAX) \ + T x, y, result;
[PATCH v2 2/4] RISC-V: Fix incorrect code gen for scalar signed SAT_ADD [PR117688]
From: Pan Li This patch would like to fix the wroing code generation for the scalar signed SAT_ADD. The input can be QI/HI/SI/DI while the alu like sub can only work on Xmode. Unfortunately we don't have sub/add for non-Xmode like QImode in scalar, thus we need to sign extend to Xmode to ensure we have the correct value before ALU like add. The gen_lowpart will generate something like lbu which has all zero for highest bits. For example, when 0xff(-1 for QImode) plus 0x2(1 for QImode), we actually want to -1 + 2 = 1, but if there is no sign extend like lbu, we will get 0xff + 2 = 0x101 which is incorrect. Thus, we have to sign extend 0xff(Qmode) to 0x(assume XImode is DImode) before plus in Xmode. The below test suites are passed for this patch. * The rv64gcv fully regression test. PR target/117688 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_expand_ssadd): Leverage the helper riscv_extend_to_xmode_reg with SIGN_EXTEND. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr117688-add-run-1-s16.c: New test. * gcc.target/riscv/pr117688-add-run-1-s32.c: New test. * gcc.target/riscv/pr117688-add-run-1-s64.c: New test. * gcc.target/riscv/pr117688-add-run-1-s8.c: New test. * gcc.target/riscv/pr117688.h: New test. Signed-off-by: Pan Li --- gcc/config/riscv/riscv.cc | 4 +-- .../gcc.target/riscv/pr117688-add-run-1-s16.c | 6 + .../gcc.target/riscv/pr117688-add-run-1-s32.c | 6 + .../gcc.target/riscv/pr117688-add-run-1-s64.c | 6 + .../gcc.target/riscv/pr117688-add-run-1-s8.c | 6 + gcc/testsuite/gcc.target/riscv/pr117688.h | 27 +++ 6 files changed, 53 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s16.c create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s32.c create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s64.c create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s8.c create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688.h diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 3b9a2c5500d..59148bfab91 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -12799,8 +12799,8 @@ riscv_expand_ssadd (rtx dest, rtx x, rtx y) machine_mode mode = GET_MODE (dest); unsigned bitsize = GET_MODE_BITSIZE (mode).to_constant (); rtx shift_bits = GEN_INT (bitsize - 1); - rtx xmode_x = gen_lowpart (Xmode, x); - rtx xmode_y = gen_lowpart (Xmode, y); + rtx xmode_x = riscv_extend_to_xmode_reg (x, mode, SIGN_EXTEND); + rtx xmode_y = riscv_extend_to_xmode_reg (y, mode, SIGN_EXTEND); rtx xmode_sum = gen_reg_rtx (Xmode); rtx xmode_dest = gen_reg_rtx (Xmode); rtx xmode_xor_0 = gen_reg_rtx (Xmode); diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s16.c b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s16.c new file mode 100644 index 000..21ec107cbf1 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s16.c @@ -0,0 +1,6 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99" } */ + +#include "pr117688.h" + +DEFINE_SIGNED_SAT_ADD_RUN(int16_t, INT16_MIN, INT16_MAX) diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s32.c b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s32.c new file mode 100644 index 000..1f197d1280b --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s32.c @@ -0,0 +1,6 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99" } */ + +#include "pr117688.h" + +DEFINE_SIGNED_SAT_ADD_RUN(int32_t, INT32_MIN, INT32_MAX) diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s64.c b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s64.c new file mode 100644 index 000..4903bc854d3 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s64.c @@ -0,0 +1,6 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99" } */ + +#include "pr117688.h" + +DEFINE_SIGNED_SAT_ADD_RUN(int64_t, INT64_MIN, INT64_MAX) diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s8.c b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s8.c new file mode 100644 index 000..a9f2fe7f192 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s8.c @@ -0,0 +1,6 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99" } */ + +#include "pr117688.h" + +DEFINE_SIGNED_SAT_ADD_RUN(int8_t, INT8_MIN, INT8_MAX) diff --git a/gcc/testsuite/gcc.target/riscv/pr117688.h b/gcc/testsuite/gcc.target/riscv/pr117688.h new file mode 100644 index 000..1013a8a8012 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr117688.h @@ -0,0 +1,27 @@ +#ifndef HAVE_DEFINED_PR117688_H +#define HAVE_DEFINED_PR117688_H + +#include + +#define DEFINE_SIGNED_SAT_ADD_RUN(T, MIN, MAX)\ + T x, y, re
[PATCH v2 1/4] RISC-V: Refactor SAT_* operand rtx extend to reg help func [NFC]
From: Pan Li This patch would like to refactor the helper function of the SAT_* scalar. The helper function will convert the define_pattern ops to the xmode reg for the underlying code-gen. This patch add new parameter for ZERO_EXTEND or SIGN_EXTEND if the input is const_int or the mode is non-Xmode. The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Rename from ... (riscv_extend_to_xmode_reg): Rename to and add rtx_code for zero/sign extend if non-Xmode. (riscv_expand_usadd): Leverage the renamed function with ZERO_EXTEND. (riscv_expand_ussub): Ditto. Signed-off-by: Pan Li --- gcc/config/riscv/riscv.cc | 61 +++ 1 file changed, 36 insertions(+), 25 deletions(-) diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 5a3a0504177..3b9a2c5500d 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -12639,17 +12639,29 @@ riscv_get_raw_result_mode (int regno) return default_get_reg_raw_mode (regno); } -/* Generate a REG rtx of Xmode from the given rtx and mode. +/* Generate a REG rtx of Xmode from the given rtx and rtx_code. The rtx x can be REG (QI/HI/SI/DI) or const_int. The machine_mode mode is the original mode from define pattern. + The rtx_code can be ZERO_EXTEND or SIGN_EXTEND. - If rtx is REG and Xmode, the RTX x will be returned directly. + If rtx is REG: - If rtx is REG and non-Xmode, the zero extended to new REG of Xmode will be - returned. + 1. If rtx Xmode, the RTX x will be returned directly. + 2. if rtx non-Xmode, the extended to new REG of Xmode will be returned. - If rtx is const_int, a new REG rtx will be created to hold the value of - const_int and then returned. + The scalar ALU like add don't support non-Xmode like QI/HI. Then the + gen_lowpart will have problem here. For example, when we would like + to add -1(0xff if QImode) and 2(0x2 if QImode). The 0xff and 0x2 will + be loaded to register for adding. Aka: + + 0xff + 0x2 = 0x101 instead of -1 + 2 = 1. + + Thus we need to sign extend 0xff to 0x if Xmode is DImode + for correctness. Similar the unsigned also need zero extend. + + If rtx is const_int: + + 1. A new REG rtx will be created to hold the value of const_int. According to the gccint doc, the constants generated for modes with fewer bits than in HOST_WIDE_INT must be sign extended to full width. Thus there @@ -12667,28 +12679,27 @@ riscv_get_raw_result_mode (int regno) the REG rtx of Xmode, instead of taking care of these in expand func. */ static rtx -riscv_gen_zero_extend_rtx (rtx x, machine_mode mode) +riscv_extend_to_xmode_reg (rtx x, machine_mode mode, enum rtx_code rcode) { + gcc_assert (rcode == ZERO_EXTEND || rcode == SIGN_EXTEND); + rtx xmode_reg = gen_reg_rtx (Xmode); - if (!CONST_INT_P (x)) + if (CONST_INT_P (x)) { if (mode == Xmode) - return x; - - riscv_emit_unary (ZERO_EXTEND, xmode_reg, x); - return xmode_reg; + emit_move_insn (xmode_reg, x); + else + { + rtx x_reg = gen_reg_rtx (mode); + emit_move_insn (x_reg, x); + riscv_emit_unary (rcode, xmode_reg, x_reg); + } } - - if (mode == Xmode) -emit_move_insn (xmode_reg, x); + else if (mode == Xmode) +return x; else -{ - rtx reg_x = gen_reg_rtx (mode); - - emit_move_insn (reg_x, x); - riscv_emit_unary (ZERO_EXTEND, xmode_reg, reg_x); -} +riscv_emit_unary (rcode, xmode_reg, x); return xmode_reg; } @@ -12709,8 +12720,8 @@ riscv_expand_usadd (rtx dest, rtx x, rtx y) machine_mode mode = GET_MODE (dest); rtx xmode_sum = gen_reg_rtx (Xmode); rtx xmode_lt = gen_reg_rtx (Xmode); - rtx xmode_x = riscv_gen_zero_extend_rtx (x, mode); - rtx xmode_y = riscv_gen_zero_extend_rtx (y, mode); + rtx xmode_x = riscv_extend_to_xmode_reg (x, mode, ZERO_EXTEND); + rtx xmode_y = riscv_extend_to_xmode_reg (y, mode, ZERO_EXTEND); rtx xmode_dest = gen_reg_rtx (Xmode); /* Step-1: sum = x + y */ @@ -12844,8 +12855,8 @@ void riscv_expand_ussub (rtx dest, rtx x, rtx y) { machine_mode mode = GET_MODE (dest); - rtx xmode_x = riscv_gen_zero_extend_rtx (x, mode); - rtx xmode_y = riscv_gen_zero_extend_rtx (y, mode); + rtx xmode_x = riscv_extend_to_xmode_reg (x, mode, ZERO_EXTEND); + rtx xmode_y = riscv_extend_to_xmode_reg (y, mode, ZERO_EXTEND); rtx xmode_lt = gen_reg_rtx (Xmode); rtx xmode_minus = gen_reg_rtx (Xmode); rtx xmode_dest = gen_reg_rtx (Xmode); -- 2.43.0
[PATCH v2 4/4] RISC-V: Fix incorrect code gen for scalar signed SAT_TRUNC [PR117688]
From: Pan Li This patch would like to fix the wroing code generation for the scalar signed SAT_TRUNC. The input can be QI/HI/SI/DI while the alu like sub can only work on Xmode. Unfortunately we don't have sub/add for non-Xmode like QImode in scalar, thus we need to sign extend to Xmode to ensure we have the correct value before ALU like add. The gen_lowpart will generate something like lbu which has all zero for highest bits. For example, when 0xff7f(-129 for HImode) trunc to QImode, we actually want compare -129 to -128, but if there is no sign extend like lbu, we will compare 0xff7f to 0xff80(assum Xmode is DImode). Thus, we have to sign extend 0xff(Qmode) to 0xff7f(assume Xmode is DImode) before compare in Xmode. The below test suites are passed for this patch. * The rv64gcv fully regression test. PR target/117688 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_expand_sstrunc): Leverage the helper riscv_extend_to_xmode_reg with SIGN_EXTEND. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr117688.h: Add test helper macros. * gcc.target/riscv/pr117688-trunc-run-1-s16-to-s8.c: New test. * gcc.target/riscv/pr117688-trunc-run-1-s32-to-s16.c: New test. * gcc.target/riscv/pr117688-trunc-run-1-s32-to-s8.c: New test. * gcc.target/riscv/pr117688-trunc-run-1-s64-to-s16.c: New test. * gcc.target/riscv/pr117688-trunc-run-1-s64-to-s32.c: New test. * gcc.target/riscv/pr117688-trunc-run-1-s64-to-s8.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/riscv.cc | 2 +- .../riscv/pr117688-trunc-run-1-s16-to-s8.c| 6 + .../riscv/pr117688-trunc-run-1-s32-to-s16.c | 6 + .../riscv/pr117688-trunc-run-1-s32-to-s8.c| 6 + .../riscv/pr117688-trunc-run-1-s64-to-s16.c | 6 + .../riscv/pr117688-trunc-run-1-s64-to-s32.c | 6 + .../riscv/pr117688-trunc-run-1-s64-to-s8.c| 6 + gcc/testsuite/gcc.target/riscv/pr117688.h | 22 +++ 8 files changed, 59 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s16-to-s8.c create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s16.c create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s8.c create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s64-to-s16.c create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s64-to-s32.c create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s64-to-s8.c diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 4999044f198..f7b00bb713f 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -13014,7 +13014,7 @@ riscv_expand_sstrunc (rtx dest, rtx src) rtx xmode_narrow_min = gen_reg_rtx (Xmode); rtx xmode_lt = gen_reg_rtx (Xmode); rtx xmode_gt = gen_reg_rtx (Xmode); - rtx xmode_src = gen_lowpart (Xmode, src); + rtx xmode_src = riscv_extend_to_xmode_reg (src, GET_MODE (src), SIGN_EXTEND); rtx xmode_dest = gen_reg_rtx (Xmode); rtx xmode_mask = gen_reg_rtx (Xmode); rtx xmode_sat = gen_reg_rtx (Xmode); diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s16-to-s8.c b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s16-to-s8.c new file mode 100644 index 000..df84615a25f --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s16-to-s8.c @@ -0,0 +1,6 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99" } */ + +#include "pr117688.h" + +DEFINE_SIGNED_SAT_TRUNC_RUN(int16_t, int8_t, INT8_MIN, INT8_MAX) diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s16.c b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s16.c new file mode 100644 index 000..1b0f860eb55 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s16.c @@ -0,0 +1,6 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99" } */ + +#include "pr117688.h" + +DEFINE_SIGNED_SAT_TRUNC_RUN(int32_t, int16_t, INT16_MIN, INT16_MAX) diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s8.c b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s8.c new file mode 100644 index 000..e412a29df36 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s8.c @@ -0,0 +1,6 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99" } */ + +#include "pr117688.h" + +DEFINE_SIGNED_SAT_TRUNC_RUN(int32_t, int8_t, INT8_MIN, INT8_MAX) diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s64-to-s16.c b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s64-to-s16.c new file mode 100644 index 000..234d33b1895 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s64-to-s16.c @@ -0,0 +1,6 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-a
[PATCH] [ifcombine] check for more zero-extension cases [PR118572]
When comparing a signed narrow variable with a wider constant that has the bit corresponding to the variable's sign bit set, we would check that the constant is a sign-extension from that sign bit, and conclude that the compare fails if it isn't. When the signed variable is masked without getting the [lr]l_signbit variable set, or when the sign bit itself is masked out, we know the sign-extension bits from the extended variable are going to be zero, so the constant will only compare equal if it is a zero- rather than sign-extension from the narrow variable's precision, therefore, check that it satisfies this property, and yield a false compare result otherwise. for gcc/ChangeLog PR tree-optimization/118572 * gimple-fold.cc (fold_truth_andor_for_ifcombine): Compare as unsigned the variables whose extension bits are masked out. for gcc/testsuite/ChangeLog PR tree-optimization/118572 * gcc.dg/field-merge-24.c: New. --- gcc/gimple-fold.cc| 20 -- gcc/testsuite/gcc.dg/field-merge-24.c | 36 + 2 files changed, 53 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/field-merge-24.c diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc index cd9d350349186..13541fe1ef749 100644 --- a/gcc/gimple-fold.cc +++ b/gcc/gimple-fold.cc @@ -8552,12 +8552,21 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree truth_type, { /* Before clipping upper bits of the right-hand operand of the compare, check that they're sign or zero extensions, depending on how the -left-hand operand would be extended. */ +left-hand operand would be extended. If it is unsigned, or if there's +a mask that zeroes out extension bits, whether because we've checked +for upper bits in the mask and did not set ll_signbit, or because the +sign bit itself is masked out, check that the right-hand operand is +zero-extended. */ bool l_non_ext_bits = false; if (ll_bitsize < lr_bitsize) { wide_int zext = wi::zext (l_const, ll_bitsize); - if ((ll_unsignedp ? zext : wi::sext (l_const, ll_bitsize)) == l_const) + if ((ll_unsignedp + || (ll_and_mask.get_precision () + && (!ll_signbit + || ((ll_and_mask & wi::mask (ll_bitsize - 1, true, ll_bitsize)) + == 0))) + ? zext : wi::sext (l_const, ll_bitsize)) == l_const) l_const = zext; else l_non_ext_bits = true; @@ -8583,7 +8592,12 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree truth_type, if (rl_bitsize < rr_bitsize) { wide_int zext = wi::zext (r_const, rl_bitsize); - if ((rl_unsignedp ? zext : wi::sext (r_const, rl_bitsize)) == r_const) + if ((rl_unsignedp + || (rl_and_mask.get_precision () + && (!rl_signbit + || ((rl_and_mask & wi::mask (rl_bitsize - 1, true, rl_bitsize)) + == 0))) + ? zext : wi::sext (r_const, rl_bitsize)) == r_const) r_const = zext; else r_non_ext_bits = true; diff --git a/gcc/testsuite/gcc.dg/field-merge-24.c b/gcc/testsuite/gcc.dg/field-merge-24.c new file mode 100644 index 0..ce5bce7d0b49c --- /dev/null +++ b/gcc/testsuite/gcc.dg/field-merge-24.c @@ -0,0 +1,36 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +/* PR tree-optimization/118572 */ +/* Check that signed compares with constants that seem signed in the other + compare operand's width get treated as unsigned if its upper bits are masked + out. */ + +__attribute__((noipa)) +int test(signed char c) +{ +return (((0x80 & (c&0xff)) != 0) && ((0xc0 & (c&0xff)) == 0x80)); +} + +__attribute__((noipa)) +int test2(signed char c) +{ +return (((-128 & (c&-1)) != 0) && ((-64 & (c&-1)) == -128)); +} + +__attribute__((noipa)) +int test3(signed char c) +{ +return (((0x80 & (c&-1)) != 0) && ((0x1248c0 & (c&-1)) == 0x124880)); +} + +__attribute__((noipa)) +int test4(signed char c) +{ +return (((0x400 & (c&-1)) == 0) && ((0x40 & (c&-1)) == 0x40)); +} + +int main() { + if (test(0x80) == 0 || test2(-128) == 0 || test3(-128) == 0 || test4(64) == 0) +__builtin_abort(); +} -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer More tolerance and less prejudice are key for inclusion and diversity Excluding neuro-others for not behaving ""normal"" is *not* inclusive
Re: [PATCH] Fortran: Added support for locality specs in DO CONCURRENT (Fortran 2018/23)
Hi Damian, looking into the code I neither find the keyword LOCAL nor REDUCE. The match routine also does not give any hint that those keywords are supported in a do concurrent loop. And here is the existing bug ticket: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101602 But I don't see a ticket for REDUCE yet. Sorry for the bad news. Regards, Andre On Wed, 22 Jan 2025 20:59:02 -0800 Damian Rouson wrote: > The first error message in my previous email led me to the following > constraint: > > “*C1130* A *variable-name *that appears in a LOCAL or LOCAL_INIT > *locality-spec *shall not have the ALLOCATABLE, INTENT (IN), or OPTIONAL > attribute, shall not be of finalizable type, shall not have an allocatable > ultimate component,...” > > My first thought was, "Holy guacamole. That seems like such > a severe limitation that the feature seems almost useless." Fortuitously, > however, it turns out that the code I sent a little while ago was missing > one important feature from my intended use case: associate. As shown > below, using associate eliminates the first error, but I'm still confused > by the remaining error message. Are locality specifiers actually supported > yet? > > Damian > > % cat locality.f90 > > program main > > implicit none > > integer pair > > integer :: mini_batch_size=1 > > > real, allocatable, dimension(:,:) :: a, dcdb > > > > allocate(a(1,1)) > > allocate(dcdb(1,1)) > > > associate(a_ => a, dcdb_ => dcdb) > > do concurrent (pair = 1:mini_batch_size) local(a_) reduce(+: dcdb_) > > a_ = 0. > > dcdb_ = 0. > > end do > > end associate > > > end program > > > % gfortran locality.f90 > > *locality.f90:12:71:* > > >12 | do concurrent (pair = 1:mini_batch_size) local(a_) reduce(+: > dcdb_) > > | > *1* > > *Error:* Sorry, LOCAL and LOCAL_INIT are not yet supported for ‘*do > concurrent*’ constructs at *(1)* > > > % gfortran --version > > GNU Fortran (GCC) 15.0.1 20250119 (experimental) > > Copyright (C) 2025 Free Software Foundation, Inc. > > This is free software; see the source for copying conditions. There is NO > > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: [PATCH] RISC-V: Disable two-source permutes for now [PR117173].
> Could you show me the a piece of codegen difference in X264 that make > performance improved ? I have one ready from SATD (see PR117173), there are more. "Before": _838 = VEC_PERM_EXPR ; _846 = VEC_PERM_EXPR ; "After": _42 = VEC_PERM_EXPR ; ... _44 = VEC_PERM_EXPR ; _45 = VEC_PERM_EXPR ; "After" is matched in match.pd and converted to "before". "Before" requires two masked, complex gathers while "after" needs to masking but just vmerge and single-source gathers.
??????[PATCH] RISC-V: Disable two-source permutes for now [PR117173].
lgtm. --Reply to Message-- On Wed, Jan 22, 2025 16:04 PM Robin Dapp
Re: [PATCH] lra: emit caller-save register spills before call insn [PR116028]
On Tue, Jan 21, 2025 at 5:45 PM Andrew Pinski wrote: > > On Thu, Aug 8, 2024 at 2:07 PM Andrew Pinski wrote: > > > > On Fri, Aug 2, 2024 at 7:30 AM Jeff Law wrote: > > > > > > > > > > > > On 8/1/24 4:12 AM, Surya Kumari Jangala wrote: > > > > lra: emit caller-save register spills before call insn [PR116028] > > > > > > > > LRA emits insns to save caller-save registers in the > > > > inheritance/splitting pass. In this pass, LRA builds EBBs (Extended > > > > Basic Block) and traverses the insns in the EBBs in reverse order from > > > > the last insn to the first insn. When LRA sees a write to a pseudo (that > > > > has been assigned a caller-save register), and there is a read following > > > > the write, with an intervening call insn between the write and read, > > > > then LRA generates a spill immediately after the write and a restore > > > > immediately before the read. The spill is needed because the call insn > > > > will clobber the caller-save register. > > > > > > > > If there is a write insn and a call insn in two separate BBs but > > > > belonging to the same EBB, the spill insn gets generated in the BB > > > > containing the write insn. If the write insn is in the entry BB, then > > > > the spill insn that is generated in the entry BB prevents shrink wrap > > > > from happening. This is because the spill insn references the stack > > > > pointer and hence the prolog gets generated in the entry BB itself. > > > > > > > > This patch ensures that the spill insn is generated before the call insn > > > > instead of after the write. This is also more efficient as the spill now > > > > occurs only in the path containing the call. > > > > > > > > 2024-08-01 Surya Kumari Jangala > > > > > > > > gcc/ > > > > PR rtl-optimization/PR116028 > > > > * lra-constraints.cc (split_reg): Spill register before call > > > > insn. > > > > (latest_call_insn): New variable. > > > > (inherit_in_ebb): Track the latest call insn. > > > > > > > > gcc/testsuite/ > > > > PR rtl-optimization/PR116028 > > > > * gcc.dg/ira-shrinkwrap-prep-1.c: Remove xfail for powerpc. > > > > * gcc.dg/pr10474.c: Remove xfail for powerpc. > > > Implementation looks fine. I would suggest a comment indicating why > > > we're inserting before last_call_insn. Otherwise someone in the future > > > would have to find the patch submission to know why we're handling that > > > case specially. > > > > > > OK with that additional comment. > > > > This causes bootstrap failure on aarch64-linux-gnu; self-tests fail at > > stage 2. Looks to be wrong code is produced compiling stage 2 > > compiler. > > I have not looked further than that right now. > > I decided to re-apply the patch to the trunk locally and see if I > could debug what was going wrong. The good news is the bootstrap > failure is gone. > The bad news is I don't know why though. I am going to see if I can > bisect where the failure mode I was getting disappears. That should > help decide if the bug has got latent or really fixed. Just a small update before I go to bed. Here are some information on the revisions which work/don't work (with the patch re-applied): r15-5746-g0547dbb725b6d8 fails the way it was reported above r15-6350-gbb829ce157f8b4 fails with a bootstrap comparison r15-6351-g24df430108c0cd fails with a bootstrap comparison r15-7077-g96f4ba4d19a765 works r15-7116-g3f641a8f1d1fafc0c6531aee185d0e74998987d5 works I will be testing more revisions tomorrow. But it looks like we are down to ~700 revisions. Thanks, Andrew Pinski > > Thanks, > Andrew Pinski > > > > > Thanks, > > Andrew > > > > > > > > Thanks, > > > jeff