Re: AArch64: Enable early scheduling for -O3 and higher (PR118351)
On Tue, Mar 4, 2025 at 9:49 AM Richard Sandiford wrote: > > Wilco Dijkstra writes: > > Hi Richard&Kyrill, > > > >>> I’m in favour of this. > >> > >> Yeah, seems ok to me too. I suppose we ought to update the documentation > >> too: > > > > I've added a note to the documentation. However it is impossible to be > > complete here > > since many targets switch off early scheduling under various circumstances. > > Yeah, that's fair. :) > > > So I've just mentioned what x86 and AArch64 do (see v2 below). > > SGTM. Getting it right for two target is a strict improvement > over the status quo. Wording suggestion below though: Note the documentation issue for x86 not enabling the pre-RA scheduler at -O2+ is PR 38768; though it was closed as fixed but it was not actually fixed (I think there was a missing reading of the sources). So please add a reference to PR 38768 in the commit message. Thanks, Andrew Pinski > > > Cheers, > > Wilco > > > > > > v2: Update documentation > > > > Enable the early scheduler on AArch64 for O3/Ofast. This means GCC15 > > benefits > > from much faster build times with -O2, but avoids the regressions in lbm > > which > > is very sensitive to minor scheduling changes due to long FMA chains. > > > > gcc: > > PR target/118351 > > * common/config/aarch64/aarch64-common.cc: Enable early scheduling > > with > > -O3 and higher. > > * doc/invoke.texi (-fschedule-insns): Update comment. > > > > --- > > > > diff --git a/gcc/common/config/aarch64/aarch64-common.cc > > b/gcc/common/config/aarch64/aarch64-common.cc > > index > > 3d694f16d1fd84e142254a4880c91a7f053e72aa..3044336923415d9414b6c66e66d872612ead24cd > > 100644 > > --- a/gcc/common/config/aarch64/aarch64-common.cc > > +++ b/gcc/common/config/aarch64/aarch64-common.cc > > @@ -54,8 +54,10 @@ static const struct default_options > > aarch_option_optimization_table[] = > > { OPT_LEVELS_FAST, OPT_fomit_frame_pointer, NULL, 1 }, > > /* Enable -fsched-pressure by default when optimizing. */ > > { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 }, > > -/* Disable early scheduling due to high compile-time overheads. */ > > +/* Except for -O3 and higher, disable early scheduling due to high > > + compile-time overheads. */ > > { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 }, > > +{ OPT_LEVELS_3_PLUS, OPT_fschedule_insns, NULL, 1 }, > > /* Enable redundant extension instructions removal at -O2 and higher. > > */ > > { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 }, > > { OPT_LEVELS_2_PLUS, OPT_mearly_ra_, NULL, AARCH64_EARLY_RA_ALL }, > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > > index > > bad49a017cc1bb0922eba9f0b2db80166d409cd7..5eb2e96c483126ad075b427969e8c02ce3f51e7a > > 100644 > > --- a/gcc/doc/invoke.texi > > +++ b/gcc/doc/invoke.texi > > @@ -13503,7 +13503,9 @@ helps machines that have slow floating point or > > memory load instructions > > by allowing other instructions to be issued until the result of the load > > or floating-point instruction is required. > > > > -Enabled at levels @option{-O2}, @option{-O3}. > > +Enabled at levels @option{-O2}, @option{-O3}. Many targets use a different > > +default, for example, it is disabled on x86 and enabled only at level > > +@option{-O3} on AArch64. > > Very minor, and this is going to be personal taste, but how about: > > Conventionally enabled at optimization levels @option{-O2} and @option{-O3}. > However, many targets override this behavior. For example, on x86, it is > disabled at all levels, while on AArch64, it is enabled only at @option{-O3}. > > (I'm just trying to avoid the second sentence contradicting the first.) > > OK with that change if noone objects in 24hrs. > > Thanks, > Richard >
Re: AArch64: Enable early scheduling for -O3 and higher (PR118351)
Wilco Dijkstra writes: > Hi Richard&Kyrill, > >>> I’m in favour of this. >> >> Yeah, seems ok to me too. I suppose we ought to update the documentation >> too: > > I've added a note to the documentation. However it is impossible to be > complete here > since many targets switch off early scheduling under various circumstances. Yeah, that's fair. :) > So I've just mentioned what x86 and AArch64 do (see v2 below). SGTM. Getting it right for two target is a strict improvement over the status quo. Wording suggestion below though: > Cheers, > Wilco > > > v2: Update documentation > > Enable the early scheduler on AArch64 for O3/Ofast. This means GCC15 benefits > from much faster build times with -O2, but avoids the regressions in lbm which > is very sensitive to minor scheduling changes due to long FMA chains. > > gcc: > PR target/118351 > * common/config/aarch64/aarch64-common.cc: Enable early scheduling > with > -O3 and higher. > * doc/invoke.texi (-fschedule-insns): Update comment. > > --- > > diff --git a/gcc/common/config/aarch64/aarch64-common.cc > b/gcc/common/config/aarch64/aarch64-common.cc > index > 3d694f16d1fd84e142254a4880c91a7f053e72aa..3044336923415d9414b6c66e66d872612ead24cd > 100644 > --- a/gcc/common/config/aarch64/aarch64-common.cc > +++ b/gcc/common/config/aarch64/aarch64-common.cc > @@ -54,8 +54,10 @@ static const struct default_options > aarch_option_optimization_table[] = > { OPT_LEVELS_FAST, OPT_fomit_frame_pointer, NULL, 1 }, > /* Enable -fsched-pressure by default when optimizing. */ > { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 }, > -/* Disable early scheduling due to high compile-time overheads. */ > +/* Except for -O3 and higher, disable early scheduling due to high > + compile-time overheads. */ > { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 }, > +{ OPT_LEVELS_3_PLUS, OPT_fschedule_insns, NULL, 1 }, > /* Enable redundant extension instructions removal at -O2 and higher. */ > { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 }, > { OPT_LEVELS_2_PLUS, OPT_mearly_ra_, NULL, AARCH64_EARLY_RA_ALL }, > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index > bad49a017cc1bb0922eba9f0b2db80166d409cd7..5eb2e96c483126ad075b427969e8c02ce3f51e7a > 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -13503,7 +13503,9 @@ helps machines that have slow floating point or > memory load instructions > by allowing other instructions to be issued until the result of the load > or floating-point instruction is required. > > -Enabled at levels @option{-O2}, @option{-O3}. > +Enabled at levels @option{-O2}, @option{-O3}. Many targets use a different > +default, for example, it is disabled on x86 and enabled only at level > +@option{-O3} on AArch64. Very minor, and this is going to be personal taste, but how about: Conventionally enabled at optimization levels @option{-O2} and @option{-O3}. However, many targets override this behavior. For example, on x86, it is disabled at all levels, while on AArch64, it is enabled only at @option{-O3}. (I'm just trying to avoid the second sentence contradicting the first.) OK with that change if noone objects in 24hrs. Thanks, Richard
Re: [PATCH v2] c++: Check invalid use of constrained auto with trailing return type [PR100589]
On 3/4/25 9:51 AM, Patrick Palka wrote: On Sun, 2 Mar 2025, Da Xie wrote: Add check for constrained auto type specifier in function declaration or function type declaration with trailing return type. Issue error if such usage is detected. Test file renamed, and added a new test for type declaration. Successfully bootstrapped and regretested on x86_64-pc-linux-gnu: Added 6 passed and 4 unsupported tests. PR c++/100589 gcc/cp/ChangeLog: * decl.cc (grokdeclarator): Issue an error for a declarator with constrained auto type specifier and trailing return types. Include function names if available. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-pr100589.C: New test. LGTM, thanks! Jason, shall I push this to trunk? Please.
Re: [PATCH] c++: wrong targs printed in hard satisfaction error [PR99214]
On 3/4/25 2:49 PM, Patrick Palka wrote: Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk/14? -- >8 -- In the three-parameter version of satisfy_declaration_constraints, when 't' isn't the most general template, then 't' won't correspond with 'args' after we augment the latter via add_outermost_template_args, and so the instantiation context that we push via push_tinst_level isn't quite correct: 'args' is a complete set of template arguments, but 't' is not necessarily the most general template. This manifests as misleading diagnostic context lines when issuing a hard error (or a constraint recursion error) that occurred during satisfaction, e.g. for the below testcase without this patch we emit: In substitution of '... void A::f() [with U = int]' and with this patch we emit: In substitution of '... void A::f() [with U = char; T = int]'. This patch fixes this by always passing the most general template to push_tinst_level. That soungs good, but getting it by passing it back from get_normalized_constraints_from_decl seems confusing; I'd think we should calculate it in parallel to changing args to correspond to that template. PR c++/99214 gcc/cp/ChangeLog: * constraint.cc (get_normalized_constraints_from_decl): New out-parameter GEN_D. (satisfy_declaration_constraints): Use it to pass the most general version of T to push_tinst_level. gcc/testsuite/ChangeLog: * g++.dg/concepts/diagnostic20.C: New test. --- gcc/cp/constraint.cc | 15 +++ gcc/testsuite/g++.dg/concepts/diagnostic20.C | 15 +++ 2 files changed, 26 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/g++.dg/concepts/diagnostic20.C diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc index a9caba8e2cc..f688a99c5fd 100644 --- a/gcc/cp/constraint.cc +++ b/gcc/cp/constraint.cc @@ -648,10 +648,13 @@ get_normalized_constraints_from_info (tree ci, tree in_decl, bool diag = false) return t; } -/* Returns the normalized constraints for the declaration D. */ +/* Returns the normalized constraints for the declaration D. + If GEN_D is non-NULL, sets *GEN_D to the most general version + of D that ultimately owns its constraints. */ static tree -get_normalized_constraints_from_decl (tree d, bool diag = false) +get_normalized_constraints_from_decl (tree d, bool diag = false, + tree *gen_d = nullptr) { tree tmpl; tree decl; @@ -716,6 +719,8 @@ get_normalized_constraints_from_decl (tree d, bool diag = false) tmpl = most_general_template (tmpl); d = tmpl ? tmpl : decl; + if (gen_d) +*gen_d = d; /* If we're not diagnosing errors, use cached constraints, if any. */ if (!diag) @@ -2730,9 +2735,11 @@ satisfy_declaration_constraints (tree t, tree args, sat_info info) return boolean_true_node; tree result = boolean_true_node; - if (tree norm = get_normalized_constraints_from_decl (t, info.noisy ())) + tree gen_t; + if (tree norm = get_normalized_constraints_from_decl (t, info.noisy (), + &gen_t)) { - if (!push_tinst_level (t, args)) + if (!push_tinst_level (gen_t, args)) return result; tree pattern = DECL_TEMPLATE_RESULT (t); push_to_top_level (); diff --git a/gcc/testsuite/g++.dg/concepts/diagnostic20.C b/gcc/testsuite/g++.dg/concepts/diagnostic20.C new file mode 100644 index 000..b8d586e9a21 --- /dev/null +++ b/gcc/testsuite/g++.dg/concepts/diagnostic20.C @@ -0,0 +1,15 @@ +// PR c++/99214 +// { dg-do compile { target c++20 } } + +template +struct A { + template static void f() requires ([] { return U::fail; }()); // { dg-error "fail" } + template static void f(); +}; + +int main() { + A::f(); +} + +// This matches the context line "In substitution of '... [with U = char; T = int]'" +// { dg-message "U = char; T = int" "" { target *-*-* } 0 }
Re: [patch, Fortran] Fix PR 119049 and 119074, external prototypes with different arglists
Hi Andre, while the patch looks ok to me, why did you not choose to generate a "22.7.2 Variable-Length Parameter Lists" https://www.gnu.org/software/c-intro-and-ref/manual/html_node/Variable-Number-of-Arguments.html when the arguments differ? Then its the callee responsibility to figure stuff out. Just out of curiosity. Is this prohibited by any standard? My question is just out of interest. No need to change the patch. Having a warning is better, then running into "WTF?". A variable-length parameter list is something that the caller and the callee must agree on, otherwise this can (and will) cause breakage on many ABIs - aarch64 and Power among them. x86_64 is an exception, there it is mostly harmless. Look good to me, ok for mainline. Committed as r15-7817-g21ca9153ebe525b077ac96811cfd48be6b259e7e . Thanks for the review! I will mention the new option in changes.html shortly. Best regards Thomas
Re: [PATCH] go/119098 - bump libgo version for GCC 15 release
On Tue, Mar 4, 2025 at 2:12 AM Richard Biener wrote: > As PR119098 shows the gccgo driver uses the install path compiled > into the shared libgo and thus updating that but not the driver > will cause the driver to fail to find other tools like cgo. > > Thus bump the SONAME for GCC 15 as well, as we've done for each > release in the past. > > I think libgo changes go through another "upstream", so Ian, can you > please take care of this? > Thanks, committed. I'll try to fix the underlying issue. Ian
[PATCH] c++: wrong targs printed in hard satisfaction error [PR99214]
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk/14? -- >8 -- In the three-parameter version of satisfy_declaration_constraints, when 't' isn't the most general template, then 't' won't correspond with 'args' after we augment the latter via add_outermost_template_args, and so the instantiation context that we push via push_tinst_level isn't quite correct: 'args' is a complete set of template arguments, but 't' is not necessarily the most general template. This manifests as misleading diagnostic context lines when issuing a hard error (or a constraint recursion error) that occurred during satisfaction, e.g. for the below testcase without this patch we emit: In substitution of '... void A::f() [with U = int]' and with this patch we emit: In substitution of '... void A::f() [with U = char; T = int]'. This patch fixes this by always passing the most general template to push_tinst_level. PR c++/99214 gcc/cp/ChangeLog: * constraint.cc (get_normalized_constraints_from_decl): New out-parameter GEN_D. (satisfy_declaration_constraints): Use it to pass the most general version of T to push_tinst_level. gcc/testsuite/ChangeLog: * g++.dg/concepts/diagnostic20.C: New test. --- gcc/cp/constraint.cc | 15 +++ gcc/testsuite/g++.dg/concepts/diagnostic20.C | 15 +++ 2 files changed, 26 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/g++.dg/concepts/diagnostic20.C diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc index a9caba8e2cc..f688a99c5fd 100644 --- a/gcc/cp/constraint.cc +++ b/gcc/cp/constraint.cc @@ -648,10 +648,13 @@ get_normalized_constraints_from_info (tree ci, tree in_decl, bool diag = false) return t; } -/* Returns the normalized constraints for the declaration D. */ +/* Returns the normalized constraints for the declaration D. + If GEN_D is non-NULL, sets *GEN_D to the most general version + of D that ultimately owns its constraints. */ static tree -get_normalized_constraints_from_decl (tree d, bool diag = false) +get_normalized_constraints_from_decl (tree d, bool diag = false, + tree *gen_d = nullptr) { tree tmpl; tree decl; @@ -716,6 +719,8 @@ get_normalized_constraints_from_decl (tree d, bool diag = false) tmpl = most_general_template (tmpl); d = tmpl ? tmpl : decl; + if (gen_d) +*gen_d = d; /* If we're not diagnosing errors, use cached constraints, if any. */ if (!diag) @@ -2730,9 +2735,11 @@ satisfy_declaration_constraints (tree t, tree args, sat_info info) return boolean_true_node; tree result = boolean_true_node; - if (tree norm = get_normalized_constraints_from_decl (t, info.noisy ())) + tree gen_t; + if (tree norm = get_normalized_constraints_from_decl (t, info.noisy (), + &gen_t)) { - if (!push_tinst_level (t, args)) + if (!push_tinst_level (gen_t, args)) return result; tree pattern = DECL_TEMPLATE_RESULT (t); push_to_top_level (); diff --git a/gcc/testsuite/g++.dg/concepts/diagnostic20.C b/gcc/testsuite/g++.dg/concepts/diagnostic20.C new file mode 100644 index 000..b8d586e9a21 --- /dev/null +++ b/gcc/testsuite/g++.dg/concepts/diagnostic20.C @@ -0,0 +1,15 @@ +// PR c++/99214 +// { dg-do compile { target c++20 } } + +template +struct A { + template static void f() requires ([] { return U::fail; }()); // { dg-error "fail" } + template static void f(); +}; + +int main() { + A::f(); +} + +// This matches the context line "In substitution of '... [with U = char; T = int]'" +// { dg-message "U = char; T = int" "" { target *-*-* } 0 } -- 2.49.0.rc0.57.gdb91954e18
Re: AArch64: Turn off outline atomics with -mcmodel=large (PR112465)
Hi Ramana, > -Generate code for the large code model. This makes no assumptions about > -addresses and sizes of sections. Programs can be statically linked only. > The > +Generate code for the large code model. This allows large .bss and .data > +sections, however .text and .rodata must still be < 2GiB in total. Position > +independent code or statically linked libraries are not supported. The > @option{-mcmodel=large} option is incompatible with @option{-mabi=ilp32}, > -@option{-fpic} and @option{-fPIC}. > +@option{-fpie}, @option{-fPIE}, @option{-fpic}, @option{-fPIC}, > +@option{-static}, @option{-moutline-atomics}. > What's the issue with -static ? Basically the small and large model are fundamentally incompatible. The infamous "dumb linker" approach means it doesn't try to sort sections, so an ADRP relocation will be out of reach if its data is placed after a huge array. Static linking with GLIBC or enabling outline atomics are in fact the same issue. As I proposed for the medium model, the solution is to place large arrays into specially named sections so that they are ordered last. That means all "small" data referenced via ADRP will be in range, and thus using outline atomics or mixing small/medium model just work. The large model will have to do the same. Cheers, Wilco
Re: [RFA] ira: Add new hooks for callee-save vs spills [PR117477]
On Tue, Mar 4, 2025 at 6:31 PM Richard Biener wrote: > > On Tue, Mar 4, 2025 at 11:18 AM Richard Sandiford > wrote: > > > > Richard Sandiford writes: > > > Jan Hubicka writes: > > >>> > > >>> Thanks for running these. I saw poor results for perlbench with my > > >>> initial aarch64 hooks because the hooks reduced the cost to zero for > > >>> the entry case: > > >>> > > >>> auto entry_cost = targetm.callee_save_cost > > >>> (spill_cost_type::SAVE, hard_regno, mode, saved_nregs, > > >>>ira_memory_move_cost[mode][rclass][0] * saved_nregs / nregs, > > >>>allocated_callee_save_regs, existing_spills_p); > > >>> /* In the event of a tie between caller-save and callee-save, > > >>>prefer callee-save. We apply this to the entry cost rather > > >>>than the exit cost since the entry frequency must be at > > >>>least as high as the exit frequency. */ > > >>> if (entry_cost > 0) > > >>> entry_cost -= 1; > > >>> > > >>> I "fixed" that by bumping the cost to a minimum of 2, but I was > > >>> wondering whether the "entry_cost > 0" should instead be "entry_cost > > > >>> 1", > > >>> so that the cost is always greater than not using a callee save for > > >>> registers that don't cross a call. WDYT? > > >> > > >> For x86 perfomance costs, the push cost should be memory_move_cost which > > >> is 6, -2 for adjustment in the target hook and -1 for this. So cost > > >> should not be 0 I think. > > >> > > >> For size cost, I currently return 1, so we indeed get 0 after > > >> adjustment. > > >> > > >> I think cost of 0 will make us to pick callee save even if caller save > > >> is available and there are no function calls, so I guess we do not want > > >> that > > > > > > OK, here's an updated patch that makes that change. The x86 parts > > > should be replaced by your patch. > > > > > > Tested on aarch64-linux-gnu. I also tried to test on > > > pwoerpc64el-linux-gnu > > > (on gcc112), but I keep getting broken pipes during the test runs, > > > so I'm struggling to get good before/after comparisons. It does at > > > least bootstrap though... > > > > Here's the patch with Honza's x86 changes. Boostrapped & regresiion-tested > > on aarch64-linux-gnu and powerpc64le-linux-gnu (gcc120). The powerpc64le > > results regressed: > > > > FAIL: gcc.dg/guality/vla-1.c -Os -DPREVENT_OPTIMIZATION line 24 i == 5 > > FAIL: gcc.dg/guality/vla-1.c -Os -DPREVENT_OPTIMIZATION line 24 sizeof > > (a) == 17 * sizeof (short) > > > > but the same test already failed for -O2 and -O3. > > > > OK to install now? Or, given the lateness in the release cycle, > > would it be better to wait for GCC 16? > > I think it's OK to install now. Not installing anything isn't an option, the > alternative would be to at least revert HJs change. I'm hoping to install this patch in GCC15. > > Thanks, > Richard. > > > > > Thanks, > > Richard > > > > > > Following on from the discussion in: > > > > https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675256.html > > > > this patch removes TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE and > > replaces it with two hooks: one that controls the cost of using an > > extra callee-saved register and one that controls the cost of allocating > > a frame for the first spill. > > > > (The patch does not attempt to address the shrink-wrapping part of > > the thread above.) > > > > On AArch64, this is enough to fix PR117477, as verified by the new tests. > > The patch does not change the SPEC2017 scores significantly. (I saw a > > slight improvement in fotonik3d and roms, but I'm not convinced that > > the improvements are real.) > > > > The patch makes IRA use caller saves for gcc.target/aarch64/pr103350-1.c, > > which is a scan-dump correctness test that relies on not using > > caller saves. The decision to use caller saves looks appropriate, > > and saves an instruction, so I've just added -fno-caller-saves > > to the test options. > > > > The x86 parts were written by Honza. > > > > gcc/ > > PR rtl-optimization/117477 > > * config/aarch64/aarch64.cc (aarch64_count_saves): New function. > > (aarch64_count_above_hard_fp_saves, aarch64_callee_save_cost) > > (aarch64_frame_allocation_cost): Likewise. > > (TARGET_CALLEE_SAVE_COST): Define. > > (TARGET_FRAME_ALLOCATION_COST): Likewise. > > * config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale): > > Replace with... > > (ix86_callee_save_cost): ...this new hook. > > (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete. > > (TARGET_CALLEE_SAVE_COST): Define. > > * target.h (spill_cost_type, frame_cost_type): New enums. > > * target.def (callee_save_cost, frame_allocation_cost): New hooks. > > (ira_callee_saved_register_cost_scale): Delete. > > * doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): > > Delete. > >
Re: [PATCH] LoongArch: Fix incorrect reorder of __lsx_vldx and __lasx_xvldx [PR119084]
LGTM! Thanks. 在 2025/3/3 下午3:29, Xi Ruoyao 写道: They could be incorrectly reordered with store instructions like st.b because the RTL expression does not have a memory_operand or a (mem) expression. The incorrect reorder has been observed in openh264 LTO build. Expand them to a (mem) expression instead of unspec to fix the issue. Then we need to make loongarch_address_insns return 1 for ADDRESS_REG_REG because the constraint "R" expects this behavior, or the vldx instruction will be considered invalid by the register allocate pass and turned to add.d + vld. Apply the ADDRESS_REG_REG penalty in loongarch_address_cost instead, loongarch_rtx_costs should also call loongarch_address_cost instead of loongarch_address_insns then. Closes: https://github.com/cisco/openh264/issues/3857 gcc/ChangeLog: PR target/119084 * config/loongarch/lasx.md (UNSPEC_LASX_XVLDX): Remove. (lasx_xvldx): Remove. * config/loongarch/lsx.md (UNSPEC_LSX_VLDX): Remove. (lsx_vldx): Remove. * config/loongarch/simd.md (QIVEC): New define_mode_iterator. (_vldx): New define_expand. * config/loongarch/loongarch.cc (loongarch_address_insns_1): New static function with most logic factored out from ... (loongarch_address_insns): ... here. Call loongarch_address_insns_1 with reg_reg_cost = 1. (loongarch_address_cost): Call loongarch_address_insns_1 with reg_reg_cost = la_addr_reg_reg_cost. gcc/testsuite/ChangeLog: PR target/119084 * gcc.target/loongarch/pr119084.c: New test. --- Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk and gcc-14 branch? gcc/config/loongarch/lasx.md | 13 - gcc/config/loongarch/loongarch.cc | 48 +++ gcc/config/loongarch/lsx.md | 13 - gcc/config/loongarch/simd.md | 9 gcc/testsuite/gcc.target/loongarch/pr119084.c | 24 ++ 5 files changed, 61 insertions(+), 46 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/pr119084.c diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md index e4505c1660d..43e3ab0026a 100644 --- a/gcc/config/loongarch/lasx.md +++ b/gcc/config/loongarch/lasx.md @@ -119,7 +119,6 @@ (define_c_enum "unspec" [ UNSPEC_LASX_XVSSRLRN UNSPEC_LASX_XVEXTL_QU_DU UNSPEC_LASX_XVLDI - UNSPEC_LASX_XVLDX UNSPEC_LASX_XVSTX UNSPEC_LASX_VECINIT_MERGE UNSPEC_LASX_VEC_SET_INTERNAL @@ -3579,18 +3578,6 @@ (define_insn "lasx_xvldi" [(set_attr "type" "simd_load") (set_attr "mode" "V4DI")]) -(define_insn "lasx_xvldx" - [(set (match_operand:V32QI 0 "register_operand" "=f") - (unspec:V32QI [(match_operand:DI 1 "register_operand" "r") - (match_operand:DI 2 "reg_or_0_operand" "rJ")] - UNSPEC_LASX_XVLDX))] - "ISA_HAS_LASX" -{ - return "xvldx\t%u0,%1,%z2"; -} - [(set_attr "type" "simd_load") - (set_attr "mode" "V32QI")]) - (define_insn "lasx_xvstx" [(set (mem:V32QI (plus:DI (match_operand:DI 1 "register_operand" "r") (match_operand:DI 2 "reg_or_0_operand" "rJ"))) diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index eb3baac7019..3779e283f8d 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -2363,14 +2363,9 @@ loongarch_index_address_p (rtx addr, machine_mode mode ATTRIBUTE_UNUSED) return true; } -/* Return the number of instructions needed to load or store a value - of mode MODE at address X. Return 0 if X isn't valid for MODE. - Assume that multiword moves may need to be split into word moves - if MIGHT_SPLIT_P, otherwise assume that a single load or store is - enough. */ - -int -loongarch_address_insns (rtx x, machine_mode mode, bool might_split_p) +static int +loongarch_address_insns_1 (rtx x, machine_mode mode, bool might_split_p, + int reg_reg_cost) { struct loongarch_address_info addr; int factor; @@ -2405,7 +2400,7 @@ loongarch_address_insns (rtx x, machine_mode mode, bool might_split_p) return factor; case ADDRESS_REG_REG: - return factor * la_addr_reg_reg_cost; + return factor * reg_reg_cost; case ADDRESS_CONST_INT: return lsx_p ? 0 : factor; @@ -2420,6 +2415,18 @@ loongarch_address_insns (rtx x, machine_mode mode, bool might_split_p) return 0; } +/* Return the number of instructions needed to load or store a value + of mode MODE at address X. Return 0 if X isn't valid for MODE. + Assume that multiword moves may need to be split into word moves + if MIGHT_SPLIT_P, otherwise assume that a single load or store is + enough. */ + +int +loongarch_address_insns (rtx x, machine_mode mode, bool might_split_p) +{ + return loongarch_address_insns_1 (x, mode, might_split_p, 1); +} + /* Return true if X fits within an
Re: [PATCH] LoongArch: Fix incorrect reorder of __lsx_vldx and __lasx_xvldx [PR119084]
On Wed, 2025-03-05 at 10:52 +0800, Lulu Cheng wrote: > LGTM! Pushed to trunk. The draft of gcc-14 backport is attached, I'll push it if it builds & tests fine and there's no objection. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University From 6e4ca8b9c4944ffe896f4d8952f29c4cd518df22 Mon Sep 17 00:00:00 2001 From: Xi Ruoyao Date: Sun, 2 Mar 2025 19:02:50 +0800 Subject: [gcc14 PATCH] LoongArch: Fix incorrect reorder of __lsx_vldx and __lasx_xvldx [PR119084] They could be incorrectly reordered with store instructions like st.b because the RTL expression does not have a memory_operand or a (mem) expression. The incorrect reorder has been observed in openh264 LTO build. Expand them to a (mem) expression instead of unspec to fix the issue. Closes: https://github.com/cisco/openh264/issues/3857 (cherry picked from commit 4856292f7a680ec478e7607f1b71781996d7d542) Edited to remove the loongarch.cc change which is not needed for gcc-14 branch. gcc/ChangeLog: PR target/119084 * config/loongarch/lasx.md (UNSPEC_LASX_XVLDX): Remove. (lasx_xvldx): Remove. * config/loongarch/lsx.md (UNSPEC_LSX_VLDX): Remove. (lsx_vldx): Remove. * config/loongarch/simd.md (QIVEC): New define_mode_iterator. (_vldx): New define_expand. gcc/testsuite/ChangeLog: PR target/119084 * gcc.target/loongarch/pr119084.c: New test. --- gcc/config/loongarch/lasx.md | 13 -- gcc/config/loongarch/lsx.md | 13 -- gcc/config/loongarch/simd.md | 10 gcc/testsuite/gcc.target/loongarch/pr119084.c | 24 +++ 4 files changed, 34 insertions(+), 26 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/pr119084.c diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md index 94bbd0c26bb..fe32194e811 100644 --- a/gcc/config/loongarch/lasx.md +++ b/gcc/config/loongarch/lasx.md @@ -155,7 +155,6 @@ (define_c_enum "unspec" [ UNSPEC_LASX_XVSSRLRN UNSPEC_LASX_XVEXTL_QU_DU UNSPEC_LASX_XVLDI - UNSPEC_LASX_XVLDX UNSPEC_LASX_XVSTX UNSPEC_LASX_VECINIT_MERGE UNSPEC_LASX_VEC_SET_INTERNAL @@ -4679,18 +4678,6 @@ (define_insn "lasx_xvldi" [(set_attr "type" "simd_load") (set_attr "mode" "V4DI")]) -(define_insn "lasx_xvldx" - [(set (match_operand:V32QI 0 "register_operand" "=f") - (unspec:V32QI [(match_operand:DI 1 "register_operand" "r") - (match_operand:DI 2 "reg_or_0_operand" "rJ")] - UNSPEC_LASX_XVLDX))] - "ISA_HAS_LASX" -{ - return "xvldx\t%u0,%1,%z2"; -} - [(set_attr "type" "simd_load") - (set_attr "mode" "V32QI")]) - (define_insn "lasx_xvstx" [(set (mem:V32QI (plus:DI (match_operand:DI 1 "register_operand" "r") (match_operand:DI 2 "reg_or_0_operand" "rJ"))) diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md index 5ee5845e84b..67ba8e8ad5d 100644 --- a/gcc/config/loongarch/lsx.md +++ b/gcc/config/loongarch/lsx.md @@ -100,7 +100,6 @@ (define_c_enum "unspec" [ UNSPEC_LSX_VSSRLRN UNSPEC_LSX_VLDI UNSPEC_LSX_VSHUF_B - UNSPEC_LSX_VLDX UNSPEC_LSX_VSTX UNSPEC_LSX_VEXTL_QU_DU UNSPEC_LSX_VSETEQZ_V @@ -3070,18 +3069,6 @@ (define_insn "lsx_vshuf_b" [(set_attr "type" "simd_shf") (set_attr "mode" "V16QI")]) -(define_insn "lsx_vldx" - [(set (match_operand:V16QI 0 "register_operand" "=f") - (unspec:V16QI [(match_operand:DI 1 "register_operand" "r") - (match_operand:DI 2 "reg_or_0_operand" "rJ")] - UNSPEC_LSX_VLDX))] - "ISA_HAS_LSX" -{ - return "vldx\t%w0,%1,%z2"; -} - [(set_attr "type" "simd_load") - (set_attr "mode" "V16QI")]) - (define_insn "lsx_vstx" [(set (mem:V16QI (plus:DI (match_operand:DI 1 "register_operand" "r") (match_operand:DI 2 "reg_or_0_operand" "rJ"))) diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md index 00ff2823a4e..691e7195636 100644 --- a/gcc/config/loongarch/simd.md +++ b/gcc/config/loongarch/simd.md @@ -113,6 +113,16 @@ (define_mode_attr bitimm [(V16QI "uimm3") (V32QI "uimm3") ;; instruction here so we can avoid duplicating logics. ;; === +;; REG + REG load + +(define_mode_iterator QIVEC [(V16QI "ISA_HAS_LSX") (V32QI "ISA_HAS_LASX")]) +(define_expand "_vldx" + [(set (match_operand:QIVEC 0 "register_operand" "=f") + (mem:QIVEC (plus:DI (match_operand:DI 1 "register_operand") + (match_operand:DI 2 "register_operand"] + "TARGET_64BIT") + + ;; ;; FP vector rounding instructions ;; diff --git a/gcc/testsuite/gcc.target/loongarch/pr119084.c b/gcc/testsuite/gcc.target/loongarch/pr119084.c new file mode 100644 index 000..b5943303851 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/pr119084.c @@ -0,0 +1,24 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mlsx" } */ +/* { dg-require-effective-target loongarch_sx_hw } */ + +typedef signed char V16QI __attribute__ ((vector_size (16))); +static char x[128]; + +__attribute__ ((noipa)) int +noopt (int x) +{
Re: [PATCH] LoongArch: Fix incorrect reorder of __lsx_vldx and __lasx_xvldx [PR119084]
在 2025/3/5 上午11:03, Xi Ruoyao 写道: On Wed, 2025-03-05 at 10:52 +0800, Lulu Cheng wrote: LGTM! Pushed to trunk. The draft of gcc-14 backport is attached, I'll push it if it builds & tests fine and there's no objection. Thanks a lot.
Re: AArch64: Turn off outline atomics with -mcmodel=large (PR112465)
> On 3 Mar 2025, at 19:52, Wilco Dijkstra wrote: > > > Outline atomics is not designed to be used with -mcmodel=large, so disable > it automatically if the large code model is used. > > Passes regress, OK for commit? > This restriction should be documented in invoke.texi IMO. I also think it would be more user friendly to warn them about the incompatibility if an explicit -moutline-atomics option is passed. It’s okay though to silently turn off the implicit default-on option though. Thanks, Kyrill > gcc: >PR target/112465 >* config/aarch64/aarch64.cc (aarch64_override_options_after_change_1): >Turn off outline atomics with -mcmodel=large. > > --- > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index > fe76730b0a7c8a2baaae24152e13d82a12d5d0a3..31d083d11bfc3e9756b41c901c96749a7b8a840a > 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -18563,6 +18563,10 @@ aarch64_override_options_after_change_1 (struct > gcc_options *opts) > intermediary step for the former. */ > if (flag_mlow_precision_sqrt) > flag_mrecip_low_precision_sqrt = true; > + > + /* Turn off outline atomics with -mcmodel=large. */ > + if (aarch64_cmodel == AARCH64_CMODEL_LARGE) > +opts->x_aarch64_flag_outline_atomics = 0; > } > > /* 'Unpack' up the internal tuning structs and update the options > >
Re: [Fortran, Patch, PR77872, v1] Fix ICE when getting caf-token from abstract class type.
Hi Steve, thanks for the review. Committed as gcc-15-7804-g5bd66483839. Thanks again, Andre On Mon, 3 Mar 2025 12:34:49 -0800 Steve Kargl wrote: > On Mon, Mar 03, 2025 at 03:58:24PM +0100, Andre Vehreschild wrote: > > > > attached patches fix a 12-regression, when a caf token is requested from an > > abstract class-typed dummy. The token was not looked up in the correct spot. > > Due the class typed object getting an artificial variable for direct derived > > type access, the get_caf_decl was looking at the wrong decl. > > > > This patch consists of two parts, the first is just some code complexity > > reduction, where an existing attr is now used instead of checking for > > BT_CLASS type and branching. > > > > Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline? > > > > Thanks. OK to commit. > -- Andre Vehreschild * Email: vehre ad gmx dot de
[Fortran, Patch, PR103391, v1] Fix gimple error on assign to pointer array.
Hi all, attached patch fixes a 12-regression when an assignment to a pointer array is done. The issue was a missing indirect ref on assign as it was already done for allocatable arrays. Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline? Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de From 27ca62555c4b09349ab33f806b386b485dfe7c8a Mon Sep 17 00:00:00 2001 From: Andre Vehreschild Date: Tue, 4 Mar 2025 12:56:20 +0100 Subject: [PATCH] Fortran: Fix gimplification error on assignment to pointer [PR103391] PR fortran/103391 gcc/fortran/ChangeLog: * trans-expr.cc (gfc_trans_assignment_1): Do not use poly assign for pointer arrays on lhs (as it is done for allocatables already). gcc/testsuite/ChangeLog: * gfortran.dg/assign_12.f90: New test. --- gcc/fortran/trans-expr.cc | 16 +++--- gcc/testsuite/gfortran.dg/assign_12.f90 | 28 + 2 files changed, 36 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/assign_12.f90 diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc index 0d790b63f95..fbe7333fd71 100644 --- a/gcc/fortran/trans-expr.cc +++ b/gcc/fortran/trans-expr.cc @@ -12876,14 +12876,14 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * expr2, bool init_flag, needed. */ lhs_attr = gfc_expr_attr (expr1); - is_poly_assign = (use_vptr_copy || lhs_attr.pointer - || (lhs_attr.allocatable && !lhs_attr.dimension)) - && (expr1->ts.type == BT_CLASS - || gfc_is_class_array_ref (expr1, NULL) - || gfc_is_class_scalar_expr (expr1) - || gfc_is_class_array_ref (expr2, NULL) - || gfc_is_class_scalar_expr (expr2)) - && lhs_attr.flavor != FL_PROCEDURE; + is_poly_assign += (use_vptr_copy + || ((lhs_attr.pointer || lhs_attr.allocatable) && !lhs_attr.dimension)) + && (expr1->ts.type == BT_CLASS || gfc_is_class_array_ref (expr1, NULL) + || gfc_is_class_scalar_expr (expr1) + || gfc_is_class_array_ref (expr2, NULL) + || gfc_is_class_scalar_expr (expr2)) + && lhs_attr.flavor != FL_PROCEDURE; assoc_assign = is_assoc_assign (expr1, expr2); diff --git a/gcc/testsuite/gfortran.dg/assign_12.f90 b/gcc/testsuite/gfortran.dg/assign_12.f90 new file mode 100644 index 000..be31021f24c --- /dev/null +++ b/gcc/testsuite/gfortran.dg/assign_12.f90 @@ -0,0 +1,28 @@ +!{ dg-do run } +! +! Check assignment works for derived types to memory referenced by pointer +! Contributed by G. Steinmetz + +program pr103391 + type t + character(1) :: c + end type + type t2 + type(t), pointer :: a(:) + end type + + type(t), target :: arr(2) + type(t2) :: r + + arr = [t('a'), t('b')] + + r = f([arr]) + if (any(r%a(:)%c /= ['a', 'b'])) stop 1 +contains + function f(x) + class(t), intent(in), target :: x(:) + type(t2) :: f + allocate(f%a(size(x,1))) + f%a = x + end +end -- 2.48.1
Re: [Fortran, Patch, PR103391, v1] Fix gimple error on assign to pointer array.
Hi Andre, You beat me to it! I had just noticed the missing indirect reference. Yes, this is good for mainline and backporting to 14-branch at very least. Thanks for the patch. If you want to do the push to mainline, I will do the backporting if you like? I'll not be back at base for a little while so there will be a suitable delay. I have several Steinmetz sourced regression fixes to bring to the list in the next days. Cheers Paul On Tue, 4 Mar 2025 at 12:02, Andre Vehreschild wrote: > Hi all, > > attached patch fixes a 12-regression when an assignment to a pointer array > is > done. The issue was a missing indirect ref on assign as it was already > done for > allocatable arrays. > > Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline? > > Regards, > Andre > -- > Andre Vehreschild * Email: vehre ad gmx dot de >
Re: [patch, Fortran] Fix PR 119049 and 119074, external prototypes with different arglists
Hi Thomas, Am 03.03.25 um 22:50 schrieb Thomas Koenig: Hello world, this patch is a bit more complicated than originally envisioned. The problem was that we were not handling external dummy arguments with -fc-prototypes-external. In looking at this, I found that we were not warning about external procedures with different argument lists. This can actually be legal (see the two test cases) but creates a problem for the C prototypes: If we have something like subroutine foo(a,n) external a if (n == 1) call a(1) if (n == 2) call a(2,3) end subroutine foo then, pre-C23, we could just have written out the prototype as void foo_ (void (*a) (), int *n); but this is illegal in C23. What to do? I finally chose to warn about the argument mismatch, with a new option. Warn only because the code above is legal, but include in -Wall because such code seems highly suspect. This option is also implied in -fc-prototypes-external. I also put a warning in the generated header file in that case, so users have a chance to see what is going on (especially since gcc now defaults to C23). Regression-tested. Comments? Suggestions for better wordings? Is -Wall too strong, should this be -Wextra (but then nobody would see it, probably...)? OK for trunk? Best regards Thomas aren't we wasting memory here? @@ -2023,6 +2023,10 @@ typedef struct gfc_symbol scope. Used in the suppression of uninitialized warnings in reallocation on assignment. */ unsigned allocated_in_scope:1; + /* Set if an external dummy argument is called with different argument lists. + This is legal in Fortran, but can cause problems with autogenerated + C prototypes for C23. */ + unsigned ext_dummy_arglist_mismatch; I assume you wanted a single bit instead of a full unsigned, i.e.: + unsigned ext_dummy_arglist_mismatch:1; Harald
Break false dependency chains on zen5
Hi, Zen5 on some variants has false dependency on tzcnt, blsi, blsr and blsmsk instructions. Those can be tested by the following benchmark jh@shroud:~> cat ee.c int main() { int a = 10; int b = 0; for (int i = 0; i < 10; i++) { #ifdef BREAK asm volatile ("xor %0, %0": "=r" (b)); #endif asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); } return 0; } jh@shroud:~> cat bmk.sh gcc ee.c -DBREAK -DINST=\"$1\" -O2 ; time ./a.out ; gcc ee.c -DINST=\"$1\" -O2 ; time ./a.out jh@shroud:~> sh bmk.sh tzcnt real0m0.886s user0m0.886s sys 0m0.000s real0m0.886s user0m0.886s sys 0m0.000s jh@shroud:~> sh bmk.sh blsi real0m0.979s user0m0.979s sys 0m0.000s real0m2.418s user0m2.418s sys 0m0.000s jh@shroud:~> sh bmk.sh blsr real0m0.986s user0m0.986s sys 0m0.000s real0m2.422s user0m2.421s sys 0m0.000s jh@shroud:~> sh bmk.sh blsmsk real0m0.973s user0m0.973s sys 0m0.000s real0m2.422s user0m2.422s sys 0m0.000s So zen5 tested seems to have problem with bls* but not with tzcnt. I tested zen1-zen4 and they do not seem to be affected. This can be mitigated by adding a splitter that first clear the destination register. We already have runable that controls tzcnt together with lzcnt and popcnt. Since it seems that only tzcnt is affected I added new tunable to control tzcnt only. I also added splitters for blsi/blsr/blsmsk implemented analogously to existing splitter for lzcnt. The patch is neutral on SPEC. We produce blsi and blsr in some internal loops, but they usually have same destination as source. However it is good to break the dependency chain to avoid patogolical cases and it is quite cheap overall, so I think we want to enable this for generic. I will send followup patch for this. Bootstrapped/regtested x86_64-linux, will commit it shortly. gcc/ChangeLog: * config/i386/i386.h (TARGET_AVOID_FALSE_DEP_FOR_TZCNT): New macro. (TARGET_AVOID_FALSE_DEP_FOR_BLS): New macro. * config/i386/i386.md (*bmi_blsi_): Add splitter for false dependency. (*bmi_blsi__ccno): Add splitter for false dependency. (*bmi_blsi__falsedep): New pattern. (*bmi_blsmsk_): Add splitter for false dependency. (*bmi_blsmsk__falsedep): New pattern. (*bmi_blsr_): Add splitter for false dependency. (*bmi_blsr__cmp): Add splitter for false dependency (*bmi_blsr__cmp_falsedep): New pattern. * config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_TZCNT): New tune. (X86_TUNE_AVOID_FALSE_DEP_FOR_BLS): New tune. gcc/testsuite/ChangeLog: * gcc.target/i386/blsi.c: New test. * gcc.target/i386/blsmsk.c: New test. * gcc.target/i386/blsr.c: New test. diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 40b1aa4e6df..c89f493d180 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -455,6 +455,10 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST]; ix86_tune_features[X86_TUNE_ADJUST_UNROLL] #define TARGET_AVOID_FALSE_DEP_FOR_BMI \ ix86_tune_features[X86_TUNE_AVOID_FALSE_DEP_FOR_BMI] +#define TARGET_AVOID_FALSE_DEP_FOR_TZCNT \ + ix86_tune_features[X86_TUNE_AVOID_FALSE_DEP_FOR_TZCNT] +#define TARGET_AVOID_FALSE_DEP_FOR_BLS \ + ix86_tune_features[X86_TUNE_AVOID_FALSE_DEP_FOR_BLS] #define TARGET_ONE_IF_CONV_INSN \ ix86_tune_features[X86_TUNE_ONE_IF_CONV_INSN] #define TARGET_AVOID_MFENCE ix86_tune_features[X86_TUNE_AVOID_MFENCE] diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 8575fbf40fe..b1cd52382a8 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -20993,7 +20993,8 @@ (ctz:SWI48 (match_dup 1)))] "TARGET_BMI" "tzcnt{}\t{%1, %0|%0, %1}"; - "&& TARGET_AVOID_FALSE_DEP_FOR_BMI && epilogue_completed + "&& (TARGET_AVOID_FALSE_DEP_FOR_BMI || TARGET_AVOID_FALSE_DEP_FOR_TZCNT) + && epilogue_completed && optimize_function_for_speed_p (cfun) && !reg_mentioned_p (operands[0], operands[1])" [(parallel @@ -21060,7 +21061,8 @@ return "bsf{}\t{%1, %0|%0, %1}"; } "(TARGET_BMI || TARGET_CPU_P (GENERIC)) - && TARGET_AVOID_FALSE_DEP_FOR_BMI && epilogue_completed + && (TARGET_AVOID_FALSE_DEP_FOR_BMI || TARG
[PATCH] ARM ACLE: add inline definitions for __fma and __fmaf in aarch64 and aarch32 headers
Hi GCC team, This patch introduces inline definitions for the __fma and __fmaf functions in the ARM ACLE headers for both aarch64 and arm targets. The new implementations use the built-in functions (__builtin_fma and __builtin_fmaf) to ensure proper inlining and adherence to the ARM ACLE requirements[1]. Changes include: - In gcc/config/aarch64/arm_acle.h: Added inline definitions for __fma and __fmaf. - In gcc/config/arm/arm_acle.h: Added inline definitions for __fma and __fmaf. These changes have been tested locally, and I have verified that they integrate smoothly with the existing ARM backend configurations. Please let me know if you have any questions or need further modifications. Thanks, Ayan [1] ARM ACLE Document: https://arm-software.github.io/acle/main/acle.html#fused-multiply-accumulate-fma Signed-off-by: Ayan Shafqat diff --git a/gcc/config/aarch64/arm_acle.h b/gcc/config/aarch64/arm_acle.h index 7976c117daf..d9e2401ea9f 100644 --- a/gcc/config/aarch64/arm_acle.h +++ b/gcc/config/aarch64/arm_acle.h @@ -129,6 +129,20 @@ __jcvt (double __a) #pragma GCC pop_options +__extension__ extern __inline double +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__fma (double __x, double __y, double __z) +{ + return __builtin_fma (__x, __y, __z); +} + +__extension__ extern __inline float +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__fmaf (float __x, float __y, float __z) +{ + return __builtin_fmaf (__x, __y, __z); +} + #pragma GCC push_options #pragma GCC target ("+nothing+frintts") __extension__ extern __inline float diff --git a/gcc/config/arm/arm_acle.h b/gcc/config/arm/arm_acle.h index c6c03fdce27..256710a2c31 100644 --- a/gcc/config/arm/arm_acle.h +++ b/gcc/config/arm/arm_acle.h @@ -829,6 +829,20 @@ __crc32cd (uint32_t __a, uint64_t __b) #endif /* __ARM_FEATURE_CRC32 */ #pragma GCC pop_options +__extension__ extern __inline double +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__fma (double __x, double __y, double __z) +{ + return __builtin_fma (__x, __y, __z); +} + +__extension__ extern __inline float +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__fmaf (float __x, float __y, float __z) +{ + return __builtin_fmaf (__x, __y, __z); +} + #ifdef __cplusplus } #endif
Re: [PATCH 1/2] __builtin_bswapXX: improve docs and tests
Oscar Gustafsson writes: > ChangeLog: > > * doc/extend.texi: Improve example for __builtin_swap16. > > --- > > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi > index ec9bb59900c..83f6e45170b 100644 > --- a/gcc/doc/extend.texi > +++ b/gcc/doc/extend.texi > @@ -16338,7 +16338,7 @@ Returns the first argument raised to the power of > the second. Unlike the > > @defbuiltin{uint16_t __builtin_bswap16 (uint16_t @var{x})} > Returns @var{x} with the order of the bytes reversed; for example, > -@code{0xaabb} becomes @code{0xbbaa}. Byte here always means > +@code{0xabcd} becomes @code{0xcdab}. Byte here always means > exactly 8 bits. > @enddefbuiltin Thanks for the patch. I agree this example is clearer, so I've pushed it to trunk. I'll push the testsuite update tomorrow if there are no objections. Richard
Re: [3/3 PATCH v6]middle-end: delay checking for alignment to load [PR118464]
Richard Biener writes: > On Tue, 4 Mar 2025, Tamar Christina wrote: > >> Hi All, >> >> This fixes two PRs on Early break vectorization by delaying the safety >> checks to >> vectorizable_load when the VF, VMAT and vectype are all known. >> >> This patch does add two new restrictions: >> >> 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven >>group sizes, as they are unaligned every n % 2 iterations and so may cross >>a page unwittingly. Sorry for the drive-by comment, but: it might be worth updating the commit message to say non-power-of-2, rather than uneven. The patch uses the right check, but the message made it sound like it didn't. Thanks, Richard >> 2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization >> if >>we cannot peel for alignment, as the alignment requirement is quite large >> at >>GROUP_SIZE * vectype_size. This is unlikely to ever be beneficial so we >>don't support it for now. >> >> There are other steps documented inside the code itself so that the reasoning >> is next to the code. >> >> As a fall-back, when the alignment fails we require partial vector support. >> >> For VLA targets like SVE return element alignment as the desired vector >> alignment. This means that the loads are never misaligned and so annoying it >> won't ever need to peel. >> >> So what I think needs to happen in GCC 16 is that. >> >> 1. during vect_compute_data_ref_alignment we need to take the max of >>POLY_VALUE_MIN and vector_alignment. >> >> 2. vect_do_peeling define skip_vector when PFA for VLA, and in the guard add >> a >>check that ncopies * vectype does not exceed POLY_VALUE_MAX which we use >> as a >>proxy for pagesize. >> >> 3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in >>vect_determine_partial_vectors_and_peeling since the first iteration has >> to >>be partial. If LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P we have to fail to >>vectorize. >> >> 4. Create a default mask to be used, so that >> vect_use_loop_mask_for_alignment_p >>becomes true and we generate the peeled check through loop control for >>partial loops. From what I can tell this won't work for >>LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling support >> at >>all in the compiler. That would need to be done independently from the >>above. >> >> In any case, not GCC 15 material so I've kept the WIP patches I have >> downstream. >> >> Bootstrapped Regtested on aarch64-none-linux-gnu, >> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu >> -m32, -m64 and no issues. >> >> Ok for master? > > OK. > > Thanks, > Richard. > >> Thanks, >> Tamar >> >> gcc/ChangeLog: >> >> PR tree-optimization/118464 >> PR tree-optimization/116855 >> * doc/invoke.texi (min-pagesize): Update docs with vectorizer use. >> * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay >> checks. >> (vect_compute_data_ref_alignment): Remove alignment checks and move to >> get_load_store_type, increase group access alignment. >> (vect_enhance_data_refs_alignment): Add note to comment needing >> investigating. >> (vect_analyze_data_refs_alignment): Likewise. >> (vect_supportable_dr_alignment): For group loads look at first DR. >> * tree-vect-stmts.cc (get_load_store_type): >> Perform safety checks for early break pfa. >> * tree-vectorizer.h (dr_set_safe_speculative_read_required, >> dr_safe_speculative_read_required, DR_SCALAR_KNOWN_BOUNDS): New. >> (need_peeling_for_alignment): Renamed to... >> (safe_speculative_read_required): .. This >> (class dr_vec_info): Add scalar_access_known_in_bounds. >> >> gcc/testsuite/ChangeLog: >> >> PR tree-optimization/118464 >> PR tree-optimization/116855 >> * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the >> load type is relaxed later. >> * gcc.dg/vect/vect-early-break_121-pr114081.c: Update. >> * gcc.dg/vect/vect-early-break_22.c: Require partial vectors. >> * gcc.dg/vect/vect-early-break_128.c: Likewise. >> * gcc.dg/vect/vect-early-break_26.c: Likewise. >> * gcc.dg/vect/vect-early-break_43.c: Likewise. >> * gcc.dg/vect/vect-early-break_44.c: Likewise. >> * gcc.dg/vect/vect-early-break_2.c: Require load_lanes. >> * gcc.dg/vect/vect-early-break_7.c: Likewise. >> * g++.dg/vect/vect-early-break_7-pr118464.cc: New test. >> * gcc.dg/vect/vect-early-break_132-pr118464.c: New test. >> * gcc.dg/vect/vect-early-break_133_pfa1.c: New test. >> * gcc.dg/vect/vect-early-break_133_pfa11.c: New test. >> * gcc.dg/vect/vect-early-break_133_pfa10.c: New test. >> * gcc.dg/vect/vect-early-break_133_pfa2.c: New test. >> * gcc.dg/vect/vect-early-break_133_pfa3.c: New test. >> * gcc.dg/vect/vect-early-break_133_pfa4.c: New test. >> * gcc.dg/vect/vect-early-break_133_pfa5.c: New test. >
Re: [PATCH] testsuite: arm: Use effective-target for unsigned-extend-1.c
On 04/03/2025 14:49, Torbjorn SVENSSON wrote: > > > On 2025-03-03 18:00, Richard Earnshaw wrote: >> On 28/02/2025 16:18, Richard Earnshaw wrote: >>> On 28/02/2025 16:12, Richard Earnshaw wrote: On 08/11/2024 18:47, Torbjörn SVENSSON wrote: > Ok for trunk and releases/gcc-14? > > -- > > A long time ago, this test forced -march=armv6. > > With -marm, the generated assembler is: > foo: > sub r0, r0, #48 > cmp r0, #9 > movhi r0, #0 > movls r0, #1 > bx lr > > With -mthumb, the generated assembler is: > foo: > subs r0, r0, #48 > movs r2, #9 > uxtb r3, r0 > movs r0, #0 > cmp r2, r3 > adcs r0, r0, r0 > uxtb r0, r0 > bx lr > Looking at this code, I think both the uxtb instructions are unnecessary. For the first UXTB. On entry, 'c' is a char (unsigned) so the value is passed by the caller in a 32-bit reg, but range-limited (by the ABI) to values between 0 and 255. We subtract 48 from that, so the range now, is from -48 to 207, but we're going to look at this as an unsigned value, so it's effectively 0..207 or UINT_MAX-48..UINT_MAX. The UXTB instruction then converts the range so that the range becomes 0..255 again (but importantly, the values between UINT_MAX-48 and UINT_MAX are mapped to the range 208..255. We then do an unsigned comparison between that value and the constant 9 to test whether the result is >= 9. Now, importantly, all the values that are in the range 208..255 are >= 9, but were also >= 9 before the UXTB instruction. In effect, the UXTB is completely redundant. >>> I said, >= 9 above, but it should be > 9 (hence condition code 'hi'). That >>> does affect the rest of the argument. >>> >>> R. >>> The second UXTB is even more egregious. We have 0 in r0 and we then conditionally add 1 to it, so r0 is in the range 0..1. Zero extending that with a UXTB is therefore clearly pointless. So neither of the UXTB instructions should be present, even if generating Thumb1 code. I think the failures are, therefore, real and we should look to fix the compiler rather than 'fix' the scope of the test. R. > Require effective-target arm_arm_ok to skip the test for thumb-only > targets (Cortex-M). > > gcc/testsuite/ChangeLog: > > * gcc.target/arm/unsigned-extend-1.c: Use effective-target > arm_arm_ok. > > Signed-off-by: Torbjörn SVENSSON > --- > gcc/testsuite/gcc.target/arm/unsigned-extend-1.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c b/gcc/ > testsuite/gcc.target/arm/unsigned-extend-1.c > index 3b4ab048fb0..73f2e1a556d 100644 > --- a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c > +++ b/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c > @@ -1,4 +1,5 @@ > /* { dg-do compile } */ > +/* { dg-require-effective-target arm_arm_ok } */ > /* { dg-options "-O2" } */ > unsigned char foo (unsigned char c) >>> >> >> I've just pushed a patch to eliminate one of the redundant zero-extends. >> The other is harder to eliminate, so I've adjusted the test to xfail when >> compiling for thumb1. > > Hi Richard, > > Can this be backported to releases/gcc-14 too? > I can do the backport if you just approve it. > > Kind regards, > Torbjörn By all means backport the testsuite change, but not the change to thumb1.md, please. R.
Re: [PATCH][stage1] tree-optimization/119103 - missed overwidening detection for shift
Richard Biener writes: > When vectorizing a shift of u16 data by an amount that's known to > be less than 16 we currently fail to emit a vector u16 shift. The > first reason is that the promotion of the shift amount is hoisted > only by PRE and that cannot preserve range info, the second reason > is that pattern detection doesn't use range info when computing > the precision required for an operation. > > The following addresses the first issue by making LIM hoist all > expressions for the pass that runs right before PRE and the > second issue by querying ranges for the shift amount. > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > PR tree-optimization/119103 > * tree-ssa-loop-im.cc (in_loop_pipeline): Globalize. > (compute_invariantness): Override costing when we run > right before PRE and PRE is enabled. > (pass_lim::execute): Adjust. > * tree-vect-patterns.cc (vect_determine_precisions_from_users): > For variable shift amounts use range information. > > * gcc.target/i386/pr119103.c: New testcase. Nice! > [...] > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > index 4f0a7ea162b..09408359cf2 100644 > --- a/gcc/tree-vect-patterns.cc > +++ b/gcc/tree-vect-patterns.cc > @@ -6544,10 +6544,19 @@ vect_determine_precisions_from_users (stmt_vec_info > stmt_info, gassign *stmt) > case RSHIFT_EXPR: >{ > tree shift = gimple_assign_rhs2 (stmt); > - if (TREE_CODE (shift) != INTEGER_CST > - || !wi::ltu_p (wi::to_widest (shift), precision)) > + unsigned int const_shift; > + wide_int min_shift, max_shift; > + if (TREE_CODE (shift) == SSA_NAME > + && vect_get_range_info (shift, &min_shift, &max_shift) > + && wi::ge_p (min_shift, 0, TYPE_SIGN (TREE_TYPE (shift))) > + && wi::lt_p (max_shift, TYPE_PRECISION (type), > + TYPE_SIGN (TREE_TYPE (shift > + const_shift = max_shift.to_uhwi (); Don't we need to use min_shift rather than max_shift for LSHIFT_EXPR's: /* We need CONST_SHIFT fewer bits of the input. */ min_input_precision = (MAX (operation_precision, const_shift) - const_shift); ? If so, I guess that means replacing const_shift with a minimum and a maximum. Thanks, Richard > + else if (TREE_CODE (shift) == INTEGER_CST > + && wi::ltu_p (wi::to_widest (shift), precision)) > + const_shift = TREE_INT_CST_LOW (shift); > + else > return; > - unsigned int const_shift = TREE_INT_CST_LOW (shift); > if (code == LSHIFT_EXPR) > { > /* Avoid creating an undefined shift.
[PATCH] middle-end/97323 - TYPE_CANONICAL vs. ARRAY_TYPE modes
For strict-alignment targets we can end up with BLKmode single-element array types when the element type is unaligned. This confuses type checking since the canonical type would have an aligned element type and a non-BLKmode mode. The following simply ignores the mode we assign to array types for this purpose, like we already do for record and union types. Bootstrap and regtest running on x86_64-unknown-linux-gnu. PR middle-end/97323 * tree.cc (gimple_canonical_types_compatible_p): Ignore TYPE_MODE also for ARRAY_TYPE. (verify_type): Likewise. * gcc.dg/pr97323.c: New testcase. --- gcc/testsuite/gcc.dg/pr97323.c | 5 + gcc/tree.cc| 2 ++ 2 files changed, 7 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/pr97323.c diff --git a/gcc/testsuite/gcc.dg/pr97323.c b/gcc/testsuite/gcc.dg/pr97323.c new file mode 100644 index 000..8845a34fa15 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr97323.c @@ -0,0 +1,5 @@ +/* { dg-do compile } */ +/* { dg-options "-g" } */ + +typedef int a __attribute__((aligned(2))); +a b[1]; diff --git a/gcc/tree.cc b/gcc/tree.cc index 0743ed71c78..4e855b4b2d3 100644 --- a/gcc/tree.cc +++ b/gcc/tree.cc @@ -13982,6 +13982,7 @@ gimple_canonical_types_compatible_p (const_tree t1, const_tree t2, flexible array members, we allow mismatching modes for structures or unions. */ if (!RECORD_OR_UNION_TYPE_P (t1) + && TREE_CODE (t1) != ARRAY_TYPE && TYPE_MODE (t1) != TYPE_MODE (t2)) return false; @@ -14313,6 +14314,7 @@ verify_type (const_tree t) flexible array members. */ && !RECORD_OR_UNION_TYPE_P (t) && !RECORD_OR_UNION_TYPE_P (TYPE_CANONICAL (t)) + && TREE_CODE (t) != ARRAY_TYPE && TYPE_MODE (t) != TYPE_MODE (TYPE_CANONICAL (t))) { error ("% of % is not compatible"); -- 2.43.0
Re: [Fortran, Patch, PR103391, v1] Fix gimple error on assign to pointer array.
Hi Paul, thanks for the review. Committed as gcc-15-7812-g04909c7ecc0. Yes please: you do the backport. Thank you very much. I am looking at pr104684 at the moment, which is the last regression on my list (i.e. that I am assigned or cc'ed to). When the regression test for this is fine, I will continue with the coarray teams stuff. If you find any regressions I have caused or you don't find the solution for, just add me to cc. Thanks again, Andre On Tue, 4 Mar 2025 13:46:33 + Paul Richard Thomas wrote: > Hi Andre, > > You beat me to it! I had just noticed the missing indirect reference. Yes, > this is good for mainline and backporting to 14-branch at very least. > > Thanks for the patch. If you want to do the push to mainline, I will do the > backporting if you like? I'll not be back at base for a little while so > there will be a suitable delay. > I have several Steinmetz sourced regression fixes to bring to the list in > the next days. > > Cheers > > Paul > > > On Tue, 4 Mar 2025 at 12:02, Andre Vehreschild wrote: > > > Hi all, > > > > attached patch fixes a 12-regression when an assignment to a pointer array > > is > > done. The issue was a missing indirect ref on assign as it was already > > done for > > allocatable arrays. > > > > Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline? > > > > Regards, > > Andre > > -- > > Andre Vehreschild * Email: vehre ad gmx dot de > > -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: [PATCH v4] libstdc++: implement constexpr memory algorithms
On Fri, 28 Feb 2025, Giuseppe D'Angelo wrote: > Hello, > > On 27/02/2025 15:34, Jonathan Wakely wrote: > > On 26/02/25 17:27 +0100, Giuseppe D'Angelo wrote: > > > On 26/02/2025 16:33, Giuseppe D'Angelo wrote: > > > > Whops, sorry, missed this sub-thread (while replying to the other one). > > > > Change of plans then, I'll amend and remove the ad-hoc constexpr macro. > > > > > > Done, v3 attached. > > > > > > Thanks, > > > -- > > > Giuseppe D'Angelo > > > > > > > > From de3751a38330f508be9f08b77136a31481018828 Mon Sep 17 00:00:00 2001 > > > From: Giuseppe D'Angelo > > > Date: Sun, 16 Feb 2025 19:37:07 +0100 > > > Subject: [PATCH] libstdc++: implement constexpr memory algorithms > > > > > > This commit adds support for C++26's constexpr specialized memory > > > algorithms, introduced by P2283R2, P3508R0, P3369R0. > > > > > > The uninitialized_default, value, copy, move and fill algorithms are > > > affected, in all of their variants (iterator-based, range-based and _n > > > versions.) > > > > > > The changes are mostly mechanical -- add `constexpr` to a number of > > > signatures when compiling in C++26 and above modes. The internal helper > > > guard class for range algorithms instead can be marked unconditionally. > > > > > > The only "real" change to the implementation of the algorithms is that > > > during constant evaluation I need to dispatch to a constexpr-friendly > > > version of them. > > > > > > For each algorithm family I've added only one test to cover it and its > > > variants; the idea is to avoid too much repetition and simplify future > > > maintenance. > > > > The patch itself looks good, but I have some comments on the ChangeLog > > part. > > I'm adding a v4 because I discovered that one more tweak was necessary to an > existing test of the header synopsis... > > > These are thigns I would have tweaked myself before pushing, but > > now that you have commit access ... > > > > > libstdc++-v3/ChangeLog: > > > > > > * include/bits/ranges_uninitialized.h: Mark the specialized > > > memory algorithms as constexpr in C++26. Also mark the members > > > of the _DestroyGuard helper class. > > > * include/bits/stl_uninitialized.h: Ditto. > > > * include/bits/stl_construct.h: Mark _Construct_novalue (which > > > uses placement new to do default initialization) as constexpr > > > in C++26. This is possible due to P2747R2, which GCC already > > > implements; other compilers in C++26 modes already implement > > > P2448R2, so there should be no issues there. > > > > Per-file ChangeLog entries should say "what, not why" ... for reasons. > > The best reasons :) > > > > That makes old GNU-style ChangeLog files sometimes not very useful. > > But at least we now have Git commit messages where we can put all the > > detailed background, rationale etc. > > > > So the explanation that it's possible to make it constexpr due to > > P2448R2 should be above in the free text part of the commit message > > above, along with the parenthesis saying what the function does. > > > > Also, please mention the changed function by name, i.e. > > > > * include/bits/stl_construct.h (_Construct_novalue): Mark > > constexpr for C++26. > > > > See > > https://www.gnu.org/prep/standards/html_node/Style-of-Change-Logs.html > > for the format. > > > > For the changes in ranges_uninitialized.h and stl_uninitialized.h it's > > basically "all the functions" so I think it's OK not to name them all. > > (It would also be OK if you did name them all individually, but I > > don't think it's necessary.) > > > > > * include/bits/version.def: Bump the feature-testing macro. > > > > Please mention which macro, e.g. something like: > > > > * include/bits/version.def (raw_memory_algorithms): Bump > > value. > > > > > * include/bits/version.h: Regenerate. > > > > (No need to mention the specific macro by name here, because the whole > > file was regenerated so this is fine.) > > > > > * testsuite/20_util/specialized_algorithms/feature_test_macro.cc: New > > > test. > > > > Please add a line break after the colon here (and the lines below). We > > try to keep ChangeLog entries below about 78 columns, but for > > libstdc++ tests that's often impossible because the pathname is > > already longer than that! But we can still put the "New test." on a > > new line. > > The contrib/mklog.py script has a maximum column width of 100, should it be > amended to 78? (I think that's what happened here, the script generated the > overlong line and I didn't touch it.) > > > > > > Feel free to push to trunk with those changes to the ChangeLog lines > > (and add Reviewed-by: tags for me and/or Patrick if you like). If > > you'd feel more comfortable sending it to the mailing list again for a > > final check, that's fine too. > > > > Either way, the traditional first commit when you've got write after > > approval access is to add yourself to the MAINTAINERS file, as per: > > http
Re: FRM ABI semantics (was Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103])
Yeah I didn't know how to articulate it (and perhaps this still requires clarification) Say we have following // reduced version of gcc.target/riscv/rvv/base/float-point-frm-run-1.c main set_frm (4); // orig global FRM update test_float_point_frm_run_1 (op1, op2, vl) set_frm (0); result = __riscv_vfadd_vv_f32m1_rm (op1, result, 1, vl); assert_equal (1, get_frm ()) // restore global 4 before returning assert_equal (4, get_frm () <-- here call restores global vs. // reduced version of gcc.target/riscv/rvv/base/float-point-frm-run-5.c main set_frm (1); // orig global FRM update test_float_point_frm_run_1 (op1, op2, vl) other_function() set_frm (2); // also global update assert_equal (2, get_frm () <-- here call doesn't restore There's no explicit annotation or anything about the FRM, yet in 2nd example other_function () we consider frm write a global event but not in test_float_point_frm_run_1 () in 1st example: I'm missing words how to explain that :-) AFAIR the general idea is that every function can potentially "clobber", i.e. set FRM globally. The intent of the first test is to save and restore the global rounding mode because we modify it locally by a rounding-mode changing intrinsic. However, as you note, the set_frm at the beginning of test_float_point_frm_run_1 has no effect. What should intuitively(?) happen is that it sets a new global FRM that we restore upon exit. We, however, restore the previous global FRM. I believe that's because set_frm is an inline assembly, inlined into test_float_point_frm_run_1 and we expect inline assemblies to _not_ change the rounding mode. That was one of my concerns during the review some time in 2023. If set_frm were a non-inlined function we'd need to assume it changes the rounding mode and restore the value it has after the set_frm call. And that's also what is different in frm-run-5.c (your second snippet). We wrap the set_frm call in another non-inlinable call. Thus, we need to assume the global rounding mode changes and save/restore. I guess it was a deliberate decision to handle asm snippets different. At least as long as they are in the same visible scope it's obvious when they're changing the rounding mode. One could argue that users doing that are on their own. Not sure that argument has worked very well, historically, though ;) -- Regards Robin
Re: [3/3 PATCH v6]middle-end: delay checking for alignment to load [PR118464]
On Tue, 4 Mar 2025, Tamar Christina wrote: > Hi All, > > This fixes two PRs on Early break vectorization by delaying the safety checks > to > vectorizable_load when the VF, VMAT and vectype are all known. > > This patch does add two new restrictions: > > 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven >group sizes, as they are unaligned every n % 2 iterations and so may cross >a page unwittingly. > > 2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization > if >we cannot peel for alignment, as the alignment requirement is quite large > at >GROUP_SIZE * vectype_size. This is unlikely to ever be beneficial so we >don't support it for now. > > There are other steps documented inside the code itself so that the reasoning > is next to the code. > > As a fall-back, when the alignment fails we require partial vector support. > > For VLA targets like SVE return element alignment as the desired vector > alignment. This means that the loads are never misaligned and so annoying it > won't ever need to peel. > > So what I think needs to happen in GCC 16 is that. > > 1. during vect_compute_data_ref_alignment we need to take the max of >POLY_VALUE_MIN and vector_alignment. > > 2. vect_do_peeling define skip_vector when PFA for VLA, and in the guard add a >check that ncopies * vectype does not exceed POLY_VALUE_MAX which we use > as a >proxy for pagesize. > > 3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in >vect_determine_partial_vectors_and_peeling since the first iteration has to >be partial. If LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P we have to fail to >vectorize. > > 4. Create a default mask to be used, so that > vect_use_loop_mask_for_alignment_p >becomes true and we generate the peeled check through loop control for >partial loops. From what I can tell this won't work for >LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling support at >all in the compiler. That would need to be done independently from the >above. > > In any case, not GCC 15 material so I've kept the WIP patches I have > downstream. > > Bootstrapped Regtested on aarch64-none-linux-gnu, > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > -m32, -m64 and no issues. > > Ok for master? OK. Thanks, Richard. > Thanks, > Tamar > > gcc/ChangeLog: > > PR tree-optimization/118464 > PR tree-optimization/116855 > * doc/invoke.texi (min-pagesize): Update docs with vectorizer use. > * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay > checks. > (vect_compute_data_ref_alignment): Remove alignment checks and move to > get_load_store_type, increase group access alignment. > (vect_enhance_data_refs_alignment): Add note to comment needing > investigating. > (vect_analyze_data_refs_alignment): Likewise. > (vect_supportable_dr_alignment): For group loads look at first DR. > * tree-vect-stmts.cc (get_load_store_type): > Perform safety checks for early break pfa. > * tree-vectorizer.h (dr_set_safe_speculative_read_required, > dr_safe_speculative_read_required, DR_SCALAR_KNOWN_BOUNDS): New. > (need_peeling_for_alignment): Renamed to... > (safe_speculative_read_required): .. This > (class dr_vec_info): Add scalar_access_known_in_bounds. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/118464 > PR tree-optimization/116855 > * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the > load type is relaxed later. > * gcc.dg/vect/vect-early-break_121-pr114081.c: Update. > * gcc.dg/vect/vect-early-break_22.c: Require partial vectors. > * gcc.dg/vect/vect-early-break_128.c: Likewise. > * gcc.dg/vect/vect-early-break_26.c: Likewise. > * gcc.dg/vect/vect-early-break_43.c: Likewise. > * gcc.dg/vect/vect-early-break_44.c: Likewise. > * gcc.dg/vect/vect-early-break_2.c: Require load_lanes. > * gcc.dg/vect/vect-early-break_7.c: Likewise. > * g++.dg/vect/vect-early-break_7-pr118464.cc: New test. > * gcc.dg/vect/vect-early-break_132-pr118464.c: New test. > * gcc.dg/vect/vect-early-break_133_pfa1.c: New test. > * gcc.dg/vect/vect-early-break_133_pfa11.c: New test. > * gcc.dg/vect/vect-early-break_133_pfa10.c: New test. > * gcc.dg/vect/vect-early-break_133_pfa2.c: New test. > * gcc.dg/vect/vect-early-break_133_pfa3.c: New test. > * gcc.dg/vect/vect-early-break_133_pfa4.c: New test. > * gcc.dg/vect/vect-early-break_133_pfa5.c: New test. > * gcc.dg/vect/vect-early-break_133_pfa6.c: New test. > * gcc.dg/vect/vect-early-break_133_pfa7.c: New test. > * gcc.dg/vect/vect-early-break_133_pfa8.c: New test. > * gcc.dg/vect/vect-early-break_133_pfa9.c: New test. > * gcc.dg/vect/vect-early-break_39.c: Update testcase for misalignment. > * gcc
[PATCH][stage1] tree-optimization/119103 - missed overwidening detection for shift
When vectorizing a shift of u16 data by an amount that's known to be less than 16 we currently fail to emit a vector u16 shift. The first reason is that the promotion of the shift amount is hoisted only by PRE and that cannot preserve range info, the second reason is that pattern detection doesn't use range info when computing the precision required for an operation. The following addresses the first issue by making LIM hoist all expressions for the pass that runs right before PRE and the second issue by querying ranges for the shift amount. Bootstrapped and tested on x86_64-unknown-linux-gnu. PR tree-optimization/119103 * tree-ssa-loop-im.cc (in_loop_pipeline): Globalize. (compute_invariantness): Override costing when we run right before PRE and PRE is enabled. (pass_lim::execute): Adjust. * tree-vect-patterns.cc (vect_determine_precisions_from_users): For variable shift amounts use range information. * gcc.target/i386/pr119103.c: New testcase. --- gcc/testsuite/gcc.target/i386/pr119103.c | 13 + gcc/tree-ssa-loop-im.cc | 10 -- gcc/tree-vect-patterns.cc| 15 --- 3 files changed, 33 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr119103.c diff --git a/gcc/testsuite/gcc.target/i386/pr119103.c b/gcc/testsuite/gcc.target/i386/pr119103.c new file mode 100644 index 000..57210dc3bbe --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr119103.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx2" } */ + +void lshift(unsigned short *x, unsigned char amount) +{ + if (amount > 15) +__builtin_unreachable(); + + for (int i = 0; i < 16; i++) +x[i] <<= amount; +} + +/* { dg-final { scan-assembler "vpsllw" } } */ diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc index 225964c6215..a3ca5af3e3e 100644 --- a/gcc/tree-ssa-loop-im.cc +++ b/gcc/tree-ssa-loop-im.cc @@ -143,6 +143,8 @@ public: different modes. */ }; +static bool in_loop_pipeline; + /* We use six bits per loop in the ref->dep_loop bitmap to record the dep_kind x dep_state combinations. */ @@ -1239,7 +1241,11 @@ compute_invariantness (basic_block bb) lim_data->cost); } - if (lim_data->cost >= LIM_EXPENSIVE) + if (lim_data->cost >= LIM_EXPENSIVE + /* When we run before PRE and PRE is active hoist all expressions +since PRE would do so anyway and we can preserve range info +but PRE cannot. */ + || (flag_tree_pre && !in_loop_pipeline)) set_profitable_level (stmt); } } @@ -3759,7 +3765,7 @@ public: unsigned int pass_lim::execute (function *fun) { - bool in_loop_pipeline = scev_initialized_p (); + in_loop_pipeline = scev_initialized_p (); if (!in_loop_pipeline) loop_optimizer_init (LOOPS_NORMAL | LOOPS_HAVE_RECORDED_EXITS); diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 4f0a7ea162b..09408359cf2 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -6544,10 +6544,19 @@ vect_determine_precisions_from_users (stmt_vec_info stmt_info, gassign *stmt) case RSHIFT_EXPR: { tree shift = gimple_assign_rhs2 (stmt); - if (TREE_CODE (shift) != INTEGER_CST - || !wi::ltu_p (wi::to_widest (shift), precision)) + unsigned int const_shift; + wide_int min_shift, max_shift; + if (TREE_CODE (shift) == SSA_NAME + && vect_get_range_info (shift, &min_shift, &max_shift) + && wi::ge_p (min_shift, 0, TYPE_SIGN (TREE_TYPE (shift))) + && wi::lt_p (max_shift, TYPE_PRECISION (type), +TYPE_SIGN (TREE_TYPE (shift + const_shift = max_shift.to_uhwi (); + else if (TREE_CODE (shift) == INTEGER_CST +&& wi::ltu_p (wi::to_widest (shift), precision)) + const_shift = TREE_INT_CST_LOW (shift); + else return; - unsigned int const_shift = TREE_INT_CST_LOW (shift); if (code == LSHIFT_EXPR) { /* Avoid creating an undefined shift. -- 2.43.0
[Fortran, Patch, PR104684, v1] Fix gimple error when pointer assigning alloc array.
Hi all, attached patch fixes a gimplification fault when a pointer assignment to an allocatable array is done. The tree type is different, because of that flag in the lang_specific structure. View-converting the type fixes the issue. Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline? Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de From 3acd5266f70c29d6b2b3078e122f61f6537b602d Mon Sep 17 00:00:00 2001 From: Andre Vehreschild Date: Tue, 4 Mar 2025 17:06:31 +0100 Subject: [PATCH] Fortran: Add view convert to pointer assign when only pointer/alloc attr differs [PR104684] PR fortran/104684 gcc/fortran/ChangeLog: * trans-array.cc (gfc_conv_expr_descriptor): Look at the lang-specific akind and do a view convert when only the akind attribute differs between pointer and allocatable array. gcc/testsuite/ChangeLog: * gfortran.dg/coarray/ptr_comp_6.f08: New test. --- gcc/fortran/trans-array.cc| 10 +++- .../gfortran.dg/coarray/ptr_comp_6.f08| 25 +++ 2 files changed, 34 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gfortran.dg/coarray/ptr_comp_6.f08 diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc index 6a00d26cb2f..925030465ac 100644 --- a/gcc/fortran/trans-array.cc +++ b/gcc/fortran/trans-array.cc @@ -8186,8 +8186,16 @@ gfc_conv_expr_descriptor (gfc_se *se, gfc_expr *expr) { if (se->direct_byref && !se->byref_noassign) { + struct lang_type *lhs_ls + = TYPE_LANG_SPECIFIC (TREE_TYPE (se->expr)), + *rhs_ls = TYPE_LANG_SPECIFIC (TREE_TYPE (desc)); + /* When only the array_kind differs, do a view_convert. */ + tmp = lhs_ls && rhs_ls && lhs_ls->rank == rhs_ls->rank + && lhs_ls->akind != rhs_ls->akind + ? build1 (VIEW_CONVERT_EXPR, TREE_TYPE (se->expr), desc) + : desc; /* Copy the descriptor for pointer assignments. */ - gfc_add_modify (&se->pre, se->expr, desc); + gfc_add_modify (&se->pre, se->expr, tmp); /* Add any offsets from subreferences. */ gfc_get_dataptr_offset (&se->pre, se->expr, desc, NULL_TREE, diff --git a/gcc/testsuite/gfortran.dg/coarray/ptr_comp_6.f08 b/gcc/testsuite/gfortran.dg/coarray/ptr_comp_6.f08 new file mode 100644 index 000..397a09bc8bc --- /dev/null +++ b/gcc/testsuite/gfortran.dg/coarray/ptr_comp_6.f08 @@ -0,0 +1,25 @@ +!{ dg-do run } +! +! Contributed by Arseny Solokha + +program pr104684 + type :: index_map +integer, allocatable :: send_index(:) + end type + type(index_map) :: imap + + imap%send_index = [5,4,3] + call sub(imap) +contains + subroutine sub(this) +type(index_map), intent(inout), target :: this +type :: box + integer, pointer :: array(:) +end type +type(box), allocatable :: buffer[:] +allocate(buffer[*]) +buffer%array => this%send_index +if (any(buffer%array /= [5,4,3])) stop 1 + end subroutine +end program + -- 2.48.1
Re: [PATCH] testsuite: arm: Use effective-target for unsigned-extend-1.c
On 2025-03-04 16:44, Richard Earnshaw (lists) wrote: On 04/03/2025 14:49, Torbjorn SVENSSON wrote: On 2025-03-03 18:00, Richard Earnshaw wrote: On 28/02/2025 16:18, Richard Earnshaw wrote: On 28/02/2025 16:12, Richard Earnshaw wrote: On 08/11/2024 18:47, Torbjörn SVENSSON wrote: Ok for trunk and releases/gcc-14? -- A long time ago, this test forced -march=armv6. With -marm, the generated assembler is: foo: sub r0, r0, #48 cmp r0, #9 movhi r0, #0 movls r0, #1 bx lr With -mthumb, the generated assembler is: foo: subs r0, r0, #48 movs r2, #9 uxtb r3, r0 movs r0, #0 cmp r2, r3 adcs r0, r0, r0 uxtb r0, r0 bx lr Looking at this code, I think both the uxtb instructions are unnecessary. For the first UXTB. On entry, 'c' is a char (unsigned) so the value is passed by the caller in a 32-bit reg, but range-limited (by the ABI) to values between 0 and 255. We subtract 48 from that, so the range now, is from -48 to 207, but we're going to look at this as an unsigned value, so it's effectively 0..207 or UINT_MAX-48..UINT_MAX. The UXTB instruction then converts the range so that the range becomes 0..255 again (but importantly, the values between UINT_MAX-48 and UINT_MAX are mapped to the range 208..255. We then do an unsigned comparison between that value and the constant 9 to test whether the result is >= 9. Now, importantly, all the values that are in the range 208..255 are >= 9, but were also >= 9 before the UXTB instruction. In effect, the UXTB is completely redundant. I said, >= 9 above, but it should be > 9 (hence condition code 'hi'). That does affect the rest of the argument. R. The second UXTB is even more egregious. We have 0 in r0 and we then conditionally add 1 to it, so r0 is in the range 0..1. Zero extending that with a UXTB is therefore clearly pointless. So neither of the UXTB instructions should be present, even if generating Thumb1 code. I think the failures are, therefore, real and we should look to fix the compiler rather than 'fix' the scope of the test. R. Require effective-target arm_arm_ok to skip the test for thumb-only targets (Cortex-M). gcc/testsuite/ChangeLog: * gcc.target/arm/unsigned-extend-1.c: Use effective-target arm_arm_ok. Signed-off-by: Torbjörn SVENSSON --- gcc/testsuite/gcc.target/arm/unsigned-extend-1.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c b/gcc/ testsuite/gcc.target/arm/unsigned-extend-1.c index 3b4ab048fb0..73f2e1a556d 100644 --- a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c +++ b/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c @@ -1,4 +1,5 @@ /* { dg-do compile } */ +/* { dg-require-effective-target arm_arm_ok } */ /* { dg-options "-O2" } */ unsigned char foo (unsigned char c) I've just pushed a patch to eliminate one of the redundant zero-extends. The other is harder to eliminate, so I've adjusted the test to xfail when compiling for thumb1. Hi Richard, Can this be backported to releases/gcc-14 too? I can do the backport if you just approve it. Kind regards, Torbjörn By all means backport the testsuite change, but not the change to thumb1.md, please. Partial backport done in r14.2.0-853-g7fb1d7bff18. Kind regards, Torbjörn
Re: The COBOL front end, version 3, now in 14 easy pieces
On Tue, 4 Mar 2025 00:08:16 +0100 Jakub Jelinek wrote: > On Mon, Mar 03, 2025 at 05:21:38PM -0500, James K. Lowden wrote: > > However IMO, the incantation: > > > > make install DESTDIR=/foo > > > > is invalid. The compiler's library search path is fixed when the > > compiler is built, based on configure options. Installing into an > > arbitrary directory cannot work; there is no opportunity then to > > alter the gcobol binary. > > This is how most distros install stuff for their packaging systems, > make install DESTDIR=/some/directory > and then everything under /some/directory > is packaged into the package. > GCC is generally relocatable, the compiler driver should find the > compiler and library/include directories etc. relative to where the > driver resides in the filesystem. Thank you for taking the time to explain that, Jakub. I stand corrected, and I believe you'll find that DESTDIR works as intended using the current cobol-patched branch. It works here for me. Somehow, despite reading the automake documentation over and over these last several days while we struggled to bring the libgcobol build process into the 20th [sic] century, I didn't put 2 + 2 together. Looking at it now, I see that by using automake we get DESTDIR for free, provided (as we do) that we use libtool for installation. In short, despite not trying to support DESTDIR, we do anyway, by happy accident. And we are now better informed. Regards, --jkl
Re: AArch64: Turn off outline atomics with -mcmodel=large (PR112465)
Hi Kyrill, > This restriction should be documented in invoke.texi IMO. > I also think it would be more user friendly to warn them about the > incompatibility if an explicit -moutline-atomics option is passed. > It’s okay though to silently turn off the implicit default-on option though. I've updated the documentation for -mcmodel=large so it's a little closer to reality. I've also added the same check as for other incompatible options like -fPIC - this means with explicit use of outline atomics you get an error immediately instead of an unclear linker error. Cheers, Wilco v2: Update docs, report incompatible use of -mcmodel=large and outline atomics. Outline atomics is not designed to be used with -mcmodel=large, so disable it automatically if the large code model is used. Report incompatible explicit use of outline atomics with large model. Update documentation. gcc: PR target/112465 * config/aarch64/aarch64.cc (aarch64_override_options_after_change_1): Turn off outline atomics with -mcmodel=large. (initialize_aarch64_code_model): Report incompatible use of large model and outline atomics. * doc/invoke.texi (-mcmodel=large): Update description. --- diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index fe76730b0a7c8a2baaae24152e13d82a12d5d0a3..8bf0c011309d1f603137556006297dae5f009a47 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -18563,6 +18563,10 @@ aarch64_override_options_after_change_1 (struct gcc_options *opts) intermediary step for the former. */ if (flag_mlow_precision_sqrt) flag_mrecip_low_precision_sqrt = true; + + /* Turn off outline atomics with -mcmodel=large. */ + if (aarch64_cmodel == AARCH64_CMODEL_LARGE) +opts->x_aarch64_flag_outline_atomics = 0; } /* 'Unpack' up the internal tuning structs and update the options @@ -19150,6 +19154,8 @@ initialize_aarch64_code_model (struct gcc_options *opts) opts->x_flag_pic > 1 ? "PIC" : "pic"); if (opts->x_aarch64_abi == AARCH64_ABI_ILP32) sorry ("code model %qs not supported in ilp32 mode", "large"); + if (opts->x_aarch64_flag_outline_atomics == 1) + sorry ("code model %qs does not support %<-moutline-atomics%>", "large"); break; case AARCH64_CMODEL_TINY_PIC: case AARCH64_CMODEL_SMALL_PIC: diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index bad49a017cc1bb0922eba9f0b2db80166d409cd7..7352da7724a60efdbdbb6b743b1f7fef258f4bc8 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -21512,10 +21512,12 @@ dynamically linked. This is the default code model. @opindex mcmodel=large @item -mcmodel=large -Generate code for the large code model. This makes no assumptions about -addresses and sizes of sections. Programs can be statically linked only. The +Generate code for the large code model. This allows large .bss and .data +sections, however .text and .rodata must still be < 2GiB in total. Position +independent code or statically linked libraries are not supported. The @option{-mcmodel=large} option is incompatible with @option{-mabi=ilp32}, -@option{-fpic} and @option{-fPIC}. +@option{-fpie}, @option{-fPIE}, @option{-fpic}, @option{-fPIC}, +@option{-static}, @option{-moutline-atomics}. @item -mtp=@var{name} @opindex mtp
Re: AArch64: Turn off outline atomics with -mcmodel=large (PR112465)
On Tue, Mar 4, 2025 at 11:47 PM Wilco Dijkstra wrote: > > Hi Kyrill, > > > This restriction should be documented in invoke.texi IMO. > > I also think it would be more user friendly to warn them about the > > incompatibility if an explicit -moutline-atomics option is passed. > > It’s okay though to silently turn off the implicit default-on option though. > > I've updated the documentation for -mcmodel=large so it's a little closer to > reality. > I've also added the same check as for other incompatible options like -fPIC - > this > means with explicit use of outline atomics you get an error immediately > instead of > an unclear linker error. > > Cheers, > Wilco > > > v2: Update docs, report incompatible use of -mcmodel=large and outline > atomics. > > Outline atomics is not designed to be used with -mcmodel=large, so disable > it automatically if the large code model is used. Report incompatible > explicit use of outline atomics with large model. Update documentation. > > gcc: > PR target/112465 > * config/aarch64/aarch64.cc (aarch64_override_options_after_change_1): > Turn off outline atomics with -mcmodel=large. > (initialize_aarch64_code_model): Report incompatible use of large > model > and outline atomics. > * doc/invoke.texi (-mcmodel=large): Update description. > > --- > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index > fe76730b0a7c8a2baaae24152e13d82a12d5d0a3..8bf0c011309d1f603137556006297dae5f009a47 > 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -18563,6 +18563,10 @@ aarch64_override_options_after_change_1 (struct > gcc_options *opts) > intermediary step for the former. */ >if (flag_mlow_precision_sqrt) > flag_mrecip_low_precision_sqrt = true; > + > + /* Turn off outline atomics with -mcmodel=large. */ > + if (aarch64_cmodel == AARCH64_CMODEL_LARGE) > +opts->x_aarch64_flag_outline_atomics = 0; > } > > /* 'Unpack' up the internal tuning structs and update the options > @@ -19150,6 +19154,8 @@ initialize_aarch64_code_model (struct gcc_options > *opts) >opts->x_flag_pic > 1 ? "PIC" : "pic"); >if (opts->x_aarch64_abi == AARCH64_ABI_ILP32) > sorry ("code model %qs not supported in ilp32 mode", "large"); > + if (opts->x_aarch64_flag_outline_atomics == 1) > + sorry ("code model %qs does not support %<-moutline-atomics%>", > "large"); >break; > case AARCH64_CMODEL_TINY_PIC: > case AARCH64_CMODEL_SMALL_PIC: > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index > bad49a017cc1bb0922eba9f0b2db80166d409cd7..7352da7724a60efdbdbb6b743b1f7fef258f4bc8 > 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -21512,10 +21512,12 @@ dynamically linked. This is the default code model. > > @opindex mcmodel=large > @item -mcmodel=large > -Generate code for the large code model. This makes no assumptions about > -addresses and sizes of sections. Programs can be statically linked only. > The > +Generate code for the large code model. This allows large .bss and .data > +sections, however .text and .rodata must still be < 2GiB in total. Position > +independent code or statically linked libraries are not supported. The > @option{-mcmodel=large} option is incompatible with @option{-mabi=ilp32}, > -@option{-fpic} and @option{-fPIC}. > +@option{-fpie}, @option{-fPIE}, @option{-fpic}, @option{-fPIC}, > +@option{-static}, @option{-moutline-atomics}. What's the issue with -static ? Ramana > > @item -mtp=@var{name} > @opindex mtp >
Re: [Fortran, Patch, PR104684, v1] Fix gimple error when pointer assigning alloc array.
Hi Andre, Am 04.03.25 um 17:11 schrieb Andre Vehreschild: Hi all, attached patch fixes a gimplification fault when a pointer assignment to an allocatable array is done. The tree type is different, because of that flag in the lang_specific structure. View-converting the type fixes the issue. Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline? this LGTM. I am wondering if this kind of fix also helps with pr107143, where pointer remapping is used, but your code is not reached in that situation. Thanks for the patch! Harald Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: [PATCH] c++: disable -Wnonnull in unevaluated context [PR115580]
On 3/4/25 3:26 PM, Marek Polacek wrote: Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? -- >8 -- This PR complains that we issue a -Wnonnull even in a decltype. Since we can't use cp_unevaluated_operand in c-common.cc, this Would it make sense to check c_inhibit_evaluation_warnings instead? fix targets even -Wformat and -Wrestrict. I think that's fine. PR c++/115580 gcc/cp/ChangeLog: * call.cc (build_over_call): Don't call check_function_arguments when cp_unevaluated_operand. gcc/testsuite/ChangeLog: * g++.dg/warn/Wnonnull16.C: New test. --- gcc/cp/call.cc | 1 + gcc/testsuite/g++.dg/warn/Wnonnull16.C | 16 2 files changed, 17 insertions(+) create mode 100644 gcc/testsuite/g++.dg/warn/Wnonnull16.C diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc index be9b0cf62f1..87ca02507cf 100644 --- a/gcc/cp/call.cc +++ b/gcc/cp/call.cc @@ -10669,6 +10669,7 @@ build_over_call (struct z_candidate *cand, int flags, tsubst_flags_t complain) bool warned_p = false; if ((complain & tf_warning) + && !cp_unevaluated_operand && (warn_nonnull || warn_format || warn_suggest_attribute_format diff --git a/gcc/testsuite/g++.dg/warn/Wnonnull16.C b/gcc/testsuite/g++.dg/warn/Wnonnull16.C new file mode 100644 index 000..8740f351ac7 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Wnonnull16.C @@ -0,0 +1,16 @@ +// PR c++/115580 +// { dg-do compile { target c++11 } } + +class WithMember { +public: + int foo(); +}; + +decltype(((WithMember*)nullptr)->foo()) footype; // { dg-bogus "pointer is null" } + +int f(void*) __attribute__((nonnull)); + +void g() +{ + [[maybe_unused]] decltype(f(nullptr)) b; // { dg-bogus "non-null" } +} base-commit: 879fd9c822633ecf2c62471d1a7f9b9619e296b7
Re: [PATCH] c++: ICE with operator new[] in constexpr [PR118775]
On 2/11/25 6:24 PM, Marek Polacek wrote: Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? -- >8 -- Here we ICE since r11-7740 because we no longer say that (long)&a (where a is a global var) is non_constant_p. So VERIFY_CONSTANT does not return and we crash on tree_to_uhwi. We should check tree_fits_uhwi_p before calling tree_to_uhwi. PR c++/118775 gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_call_expression): Check tree_fits_uhwi_p. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/constexpr-new24.C: New test. * g++.dg/cpp2a/constexpr-new25.C: New test. --- gcc/cp/constexpr.cc | 7 + gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C | 25 ++ gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C | 27 3 files changed, 59 insertions(+) create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc index f142dd32bc8..f8f9a9df1a2 100644 --- a/gcc/cp/constexpr.cc +++ b/gcc/cp/constexpr.cc @@ -2909,6 +2909,13 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t, gcc_assert (arg0); if (new_op_p) { + if (!tree_fits_uhwi_p (arg0)) + { + if (!ctx->quiet) + error_at (loc, "cannot allocate array: size too large"); "too large" seems misleading in this case, where it just isn't a compile-time constant. Why didn't the VERIFY_CONSTANT just above already reject this? + *non_constant_p = true; + return t; + } tree type = build_array_type_nelts (char_type_node, tree_to_uhwi (arg0)); tree var = build_decl (loc, VAR_DECL, diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C b/gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C new file mode 100644 index 000..debb7f0f5c4 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C @@ -0,0 +1,25 @@ +// PR c++/118775 +// { dg-do compile { target c++20 } } + +int a; + +constexpr char * +f1 () +{ + constexpr auto p = new char[(long int) &a]; // { dg-error "size too large" } + return p; +} + +constexpr char * +f2 () +{ + auto p = new char[(long int) &a]; // { dg-error "size too large" } + return p; +} + +void +g () +{ + auto r1 = f2 (); + constexpr auto r2 = f2 (); // { dg-message "in .constexpr. expansion" } +} diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C b/gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C new file mode 100644 index 000..91c0318abd8 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C @@ -0,0 +1,27 @@ +// PR c++/118775 +// { dg-do compile { target c++20 } } + +namespace std { +struct __uniq_ptr_impl { + constexpr __uniq_ptr_impl(char *) {} +}; +template struct unique_ptr { + __uniq_ptr_impl _M_t; + constexpr ~unique_ptr() {} +}; +template struct _MakeUniq; +template struct _MakeUniq<_Tp[]> { + typedef unique_ptr<_Tp[]> __array; +}; +template using __unique_ptr_array_t = _MakeUniq<_Tp>::__array; +constexpr __unique_ptr_array_t make_unique(long __num) { + return unique_ptr(new char[__num]); +} +} // namespace std +int a; +int +main () +{ + std::unique_ptr p = std::make_unique((long)&a); + constexpr std::unique_ptr p2 = std::make_unique((long)&a); // { dg-error "conversion" } +} base-commit: 299a8e2dc667e795991bc439d2cad5ea5bd379e2
Re: [PATCH v2 3/5][STAGE1] ctf: translate annotation DIEs to internal ctf
On 2/24/25 1:55 PM, David Faust wrote: On 2/24/25 11:04, Indu Bhagat wrote: On 2/6/25 11:54 AM, David Faust wrote: Translate DW_TAG_GNU_annotation DIEs created for C attributes btf_decl_tag and btf_type_tag into an in-memory representation in the CTF/BTF container. They will be output in BTF as BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG records. The new CTF kinds used to represent these annotations, CTF_K_DECL_TAG and CTF_K_TYPE_TAG, are expected to be formalized in the next version of the CTF specification. For now they only exist in memory as a translation step to BTF, and are not emitted when generating CTF information. The patch overall looks OK to me. Some comments inlined below. David, any progress with this? Looks like the patch is in pretty good shape. Let us push hard so the patch can be merged soon? Thanks, Yonghong
Re: [PATCH] c++: ICE with operator new[] in constexpr [PR118775]
Ping. On Tue, Feb 11, 2025 at 06:24:32PM -0500, Marek Polacek wrote: > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? > > -- >8 -- > Here we ICE since r11-7740 because we no longer say that (long)&a > (where a is a global var) is non_constant_p. So VERIFY_CONSTANT > does not return and we crash on tree_to_uhwi. We should check > tree_fits_uhwi_p before calling tree_to_uhwi. > > PR c++/118775 > > gcc/cp/ChangeLog: > > * constexpr.cc (cxx_eval_call_expression): Check tree_fits_uhwi_p. > > gcc/testsuite/ChangeLog: > > * g++.dg/cpp2a/constexpr-new24.C: New test. > * g++.dg/cpp2a/constexpr-new25.C: New test. > --- > gcc/cp/constexpr.cc | 7 + > gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C | 25 ++ > gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C | 27 > 3 files changed, 59 insertions(+) > create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C > create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C > > diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc > index f142dd32bc8..f8f9a9df1a2 100644 > --- a/gcc/cp/constexpr.cc > +++ b/gcc/cp/constexpr.cc > @@ -2909,6 +2909,13 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, > tree t, > gcc_assert (arg0); > if (new_op_p) > { > + if (!tree_fits_uhwi_p (arg0)) > + { > + if (!ctx->quiet) > + error_at (loc, "cannot allocate array: size too large"); > + *non_constant_p = true; > + return t; > + } > tree type = build_array_type_nelts (char_type_node, > tree_to_uhwi (arg0)); > tree var = build_decl (loc, VAR_DECL, > diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C > b/gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C > new file mode 100644 > index 000..debb7f0f5c4 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C > @@ -0,0 +1,25 @@ > +// PR c++/118775 > +// { dg-do compile { target c++20 } } > + > +int a; > + > +constexpr char * > +f1 () > +{ > + constexpr auto p = new char[(long int) &a]; // { dg-error "size too large" > } > + return p; > +} > + > +constexpr char * > +f2 () > +{ > + auto p = new char[(long int) &a]; // { dg-error "size too large" } > + return p; > +} > + > +void > +g () > +{ > + auto r1 = f2 (); > + constexpr auto r2 = f2 (); // { dg-message "in .constexpr. expansion" } > +} > diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C > b/gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C > new file mode 100644 > index 000..91c0318abd8 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C > @@ -0,0 +1,27 @@ > +// PR c++/118775 > +// { dg-do compile { target c++20 } } > + > +namespace std { > +struct __uniq_ptr_impl { > + constexpr __uniq_ptr_impl(char *) {} > +}; > +template struct unique_ptr { > + __uniq_ptr_impl _M_t; > + constexpr ~unique_ptr() {} > +}; > +template struct _MakeUniq; > +template struct _MakeUniq<_Tp[]> { > + typedef unique_ptr<_Tp[]> __array; > +}; > +template using __unique_ptr_array_t = _MakeUniq<_Tp>::__array; > +constexpr __unique_ptr_array_t make_unique(long __num) { > + return unique_ptr(new char[__num]); > +} > +} // namespace std > +int a; > +int > +main () > +{ > + std::unique_ptr p = std::make_unique((long)&a); > + constexpr std::unique_ptr p2 = std::make_unique((long)&a); // { dg-error > "conversion" } > +} > > base-commit: 299a8e2dc667e795991bc439d2cad5ea5bd379e2 > -- > 2.48.1 > Marek
Re: [PATCH] c++: Check invalid use of constrained auto with trailing return type [PR100589]
On Sun, 2 Mar 2025, xxie-xd wrote: > * Patrick Palka [2025-02-28 10:37:37]: > > > Hi, > > > > On Fri, 28 Feb 2025, Da Xie wrote: > > > > > This bug comes from a missing check when a function declaration has a > > > late-specified return type and a constrained auto type specifier. By > > > adding such a check, we can catch such uses and diagnose them as errors. > > > > > > Successfully bootstrapped and regretested on x86_64-pc-linux-gnu: > > > adds 6 new test results and 4 UNSUPPORTED results to g++.sum. The 4 > > > UNSUPPORTED results are due to the fact that concepts are not supported > > > before c++20, so tests cannot pass with c++17 and lower standards. > > > > > > PR c++/100589 > > > > > > gcc/cp/ChangeLog: > > > > > > * decl.cc (grokdeclarator): Issue an error for a declarator with > > > constrained auto type specifier and trailing return types. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * g++.dg/other/pr100589.C: New test. > > > > Thanks for the patch! Looks good to me overall, some minor comments > > below. > > > > Thank you for the review! I'll make changes to the patch based on your > advice. > > > > > > > Signed-off-by: Da Xie > > > --- > > > gcc/cp/decl.cc| 7 +++ > > > gcc/testsuite/g++.dg/other/pr100589.C | 7 +++ > > > 2 files changed, 14 insertions(+) > > > create mode 100644 gcc/testsuite/g++.dg/other/pr100589.C > > > > > > diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc > > > index 9ca8c6c4481..f283ed996a2 100644 > > > --- a/gcc/cp/decl.cc > > > +++ b/gcc/cp/decl.cc > > > @@ -14037,6 +14037,13 @@ grokdeclarator (const cp_declarator *declarator, > > > "invalid use of %"); > > > return error_mark_node; > > > } > > > + else if (is_constrained_auto (type)) > > > + { > > > + error_at (typespec_loc, "invalid use of " > > > + "constrained auto type specifier " > > > + "in function with trailing return type"); > > > > We should quote the 'auto' in this diagnostic by writing % > > for consistency with the other diagnostics. > > > > I have added an if statement to handle cases where the declarator > represents a named function. In such scenarios, an error will be issued, > including the function name, the use of constrained %, and the > existence of a trailing return type. For all other cases, an error > message will indicate the invalid use of constrained %. > > Curiously, I found that if the same error message occurred multiple > times(In my case, the above error message does not print function names, > so every function/type declartions issues the same error message), > a single dg-error will catch all of them, even if these messages > are issued on different lines of code. I believe that's actually because of the .* in your pattern, which also matches trailing newlines and so it can greedily match errors issued later in the testcase. Normally dg-error matches only errors occurring on the line it's written in. So for that reason it's better to use \[^\n\r\]* instead of .* as the "match anything" pattern in a dg-error. Very rarely do we actually want a dg-error to look into multiple lines of errors. This could use documenting on the wiki page. > I think this is the bug mentioned > in GCC Wiki (https://gcc.gnu.org/wiki/HowToPrepareATestcase#TODO). > Thankfully, in the updated testcase, the error messages are issued for a > function declaration and type declaration each. I have updated the > dg-error regex pattern to capture error messages separately for > functions and types. This ensures more precise matching and handling of > the respective error cases. > > > > + return error_mark_node; > > > + } > > > tree tmpl = CLASS_PLACEHOLDER_TEMPLATE (auto_node); > > > if (!tmpl) > > > if (tree late_auto = type_uses_auto (late_return_type)) > > > diff --git a/gcc/testsuite/g++.dg/other/pr100589.C > > > b/gcc/testsuite/g++.dg/other/pr100589.C > > > new file mode 100644 > > > index 000..3fa054ed34d > > > --- /dev/null > > > +++ b/gcc/testsuite/g++.dg/other/pr100589.C > > > > Since this is a concepts-specific test let's name it > > > > g++.dg/cpp2a/concepts-pr100589.C > > > > Renamed. > > > > @@ -0,0 +1,7 @@ > > > +// PR c++/100589 > > > +// { dg-do compile { target c++20 } } > > > + > > > +template > > > +concept false_concept = false; > > > + > > > +false_concept auto f() -> int; // { dg-error "constrained auto type > > > specifier.* trailing return type" } > > > > It'd be good to also test a bare function type: > > > > using type = false_concept auto() -> int; > > > > Test added. > > > > \ No newline at end of file > > > -- > > > 2.34.1 > > > > > > > > Thanks again for your reply. I will send updated patch when the > bootstrap and regression tests are finished. > >
Re: [PATCH] testsuite: arm: Use effective-target for unsigned-extend-1.c
On 2025-03-03 18:00, Richard Earnshaw wrote: On 28/02/2025 16:18, Richard Earnshaw wrote: On 28/02/2025 16:12, Richard Earnshaw wrote: On 08/11/2024 18:47, Torbjörn SVENSSON wrote: Ok for trunk and releases/gcc-14? -- A long time ago, this test forced -march=armv6. With -marm, the generated assembler is: foo: sub r0, r0, #48 cmp r0, #9 movhi r0, #0 movls r0, #1 bx lr With -mthumb, the generated assembler is: foo: subs r0, r0, #48 movs r2, #9 uxtb r3, r0 movs r0, #0 cmp r2, r3 adcs r0, r0, r0 uxtb r0, r0 bx lr Looking at this code, I think both the uxtb instructions are unnecessary. For the first UXTB. On entry, 'c' is a char (unsigned) so the value is passed by the caller in a 32-bit reg, but range-limited (by the ABI) to values between 0 and 255. We subtract 48 from that, so the range now, is from -48 to 207, but we're going to look at this as an unsigned value, so it's effectively 0..207 or UINT_MAX-48..UINT_MAX. The UXTB instruction then converts the range so that the range becomes 0..255 again (but importantly, the values between UINT_MAX-48 and UINT_MAX are mapped to the range 208..255. We then do an unsigned comparison between that value and the constant 9 to test whether the result is >= 9. Now, importantly, all the values that are in the range 208..255 are >= 9, but were also >= 9 before the UXTB instruction. In effect, the UXTB is completely redundant. I said, >= 9 above, but it should be > 9 (hence condition code 'hi'). That does affect the rest of the argument. R. The second UXTB is even more egregious. We have 0 in r0 and we then conditionally add 1 to it, so r0 is in the range 0..1. Zero extending that with a UXTB is therefore clearly pointless. So neither of the UXTB instructions should be present, even if generating Thumb1 code. I think the failures are, therefore, real and we should look to fix the compiler rather than 'fix' the scope of the test. R. Require effective-target arm_arm_ok to skip the test for thumb-only targets (Cortex-M). gcc/testsuite/ChangeLog: * gcc.target/arm/unsigned-extend-1.c: Use effective-target arm_arm_ok. Signed-off-by: Torbjörn SVENSSON --- gcc/testsuite/gcc.target/arm/unsigned-extend-1.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c b/gcc/ testsuite/gcc.target/arm/unsigned-extend-1.c index 3b4ab048fb0..73f2e1a556d 100644 --- a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c +++ b/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c @@ -1,4 +1,5 @@ /* { dg-do compile } */ +/* { dg-require-effective-target arm_arm_ok } */ /* { dg-options "-O2" } */ unsigned char foo (unsigned char c) I've just pushed a patch to eliminate one of the redundant zero-extends. The other is harder to eliminate, so I've adjusted the test to xfail when compiling for thumb1. Hi Richard, Can this be backported to releases/gcc-14 too? I can do the backport if you just approve it. Kind regards, Torbjörn
Re: [PATCH v2] c++: Check invalid use of constrained auto with trailing return type [PR100589]
On Sun, 2 Mar 2025, Da Xie wrote: > Add check for constrained auto type specifier in function declaration or > function type declaration with trailing return type. Issue error if such > usage is detected. > > Test file renamed, and added a new test for type declaration. > > Successfully bootstrapped and regretested on x86_64-pc-linux-gnu: > Added 6 passed and 4 unsupported tests. > > PR c++/100589 > > gcc/cp/ChangeLog: > > * decl.cc (grokdeclarator): Issue an error for a declarator with > constrained auto type specifier and trailing return types. Include > function names if available. > > gcc/testsuite/ChangeLog: > > * g++.dg/cpp2a/concepts-pr100589.C: New test. LGTM, thanks! Jason, shall I push this to trunk? > > Signed-off-by: Da Xie > --- > gcc/cp/decl.cc | 13 + > gcc/testsuite/g++.dg/cpp2a/concepts-pr100589.C | 9 + > 2 files changed, 22 insertions(+) > create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-pr100589.C > > diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc > index 9ca8c6c4481..337ee65752e 100644 > --- a/gcc/cp/decl.cc > +++ b/gcc/cp/decl.cc > @@ -14037,6 +14037,19 @@ grokdeclarator (const cp_declarator *declarator, > "invalid use of %"); > return error_mark_node; > } > + else if (is_constrained_auto (type)) > + { > + if (funcdecl_p) > + error_at (typespec_loc, > + "%qs function with trailing return type " > + "has constrained % type specifier " > + "rather than plain %", > + name); > + else > + error_at (typespec_loc, > + "invalid use of constrained % type"); > + return error_mark_node; > + } > tree tmpl = CLASS_PLACEHOLDER_TEMPLATE (auto_node); > if (!tmpl) > if (tree late_auto = type_uses_auto (late_return_type)) > diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-pr100589.C > b/gcc/testsuite/g++.dg/cpp2a/concepts-pr100589.C > new file mode 100644 > index 000..0c60d31f29b > --- /dev/null > +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-pr100589.C > @@ -0,0 +1,9 @@ > +// PR c++/100589 > +// { dg-do compile { target c++20 } } > + > +template > +concept false_concept = false; > + > +false_concept auto f() -> int; // { dg-error "'f' .* trailing return type.* > constrained 'auto'" } I can replace the use of .* with \[^\r\n\]* before pushing. > + > +using type = false_concept auto() -> int; // { dg-error "invalid use of > constrained 'auto' type"} > -- > 2.34.1 > >
RE: FRM ABI semantics (was Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103])
Agree with Robin and Andrew. I was thinking we may not need to take care of vxrm similar as frm in previous, but seems not that simple up to a point after Richard S points it out. I also failed to locate some spec/doc about the details. I remember Jeff has some findings/comments, and may wait for a while if we need to tweak the vxrm behaviors, like set, consume, global_reg, ... etc. Pan -Original Message- From: Robin Dapp Sent: Tuesday, March 4, 2025 8:51 PM To: Vineet Gupta ; Andrew Waterman Cc: Li, Pan2 ; jeffreya...@gmail.com; gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.sandif...@arm.com; Robin Dapp Subject: Re: FRM ABI semantics (was Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103]) > Yeah I didn't know how to articulate it (and perhaps this still requires > clarification) > > Say we have following > > // reduced version of gcc.target/riscv/rvv/base/float-point-frm-run-1.c > main > set_frm (4); // orig global FRM update > > test_float_point_frm_run_1 (op1, op2, vl) > set_frm (0); > result = __riscv_vfadd_vv_f32m1_rm (op1, result, 1, vl); > assert_equal (1, get_frm ()) > // restore global 4 before returning > > assert_equal (4, get_frm () <-- here call restores global > > vs. > > // reduced version of gcc.target/riscv/rvv/base/float-point-frm-run-5.c > main > set_frm (1); // orig global FRM update > > test_float_point_frm_run_1 (op1, op2, vl) > other_function() > set_frm (2); // also global update > > assert_equal (2, get_frm () <-- here call doesn't restore > > There's no explicit annotation or anything about the FRM, yet in 2nd example > other_function () we consider frm write a global event but not in > test_float_point_frm_run_1 () in 1st example: I'm missing words how to explain > that :-) AFAIR the general idea is that every function can potentially "clobber", i.e. set FRM globally. The intent of the first test is to save and restore the global rounding mode because we modify it locally by a rounding-mode changing intrinsic. However, as you note, the set_frm at the beginning of test_float_point_frm_run_1 has no effect. What should intuitively(?) happen is that it sets a new global FRM that we restore upon exit. We, however, restore the previous global FRM. I believe that's because set_frm is an inline assembly, inlined into test_float_point_frm_run_1 and we expect inline assemblies to _not_ change the rounding mode. That was one of my concerns during the review some time in 2023. If set_frm were a non-inlined function we'd need to assume it changes the rounding mode and restore the value it has after the set_frm call. And that's also what is different in frm-run-5.c (your second snippet). We wrap the set_frm call in another non-inlinable call. Thus, we need to assume the global rounding mode changes and save/restore. I guess it was a deliberate decision to handle asm snippets different. At least as long as they are in the same visible scope it's obvious when they're changing the rounding mode. One could argue that users doing that are on their own. Not sure that argument has worked very well, historically, though ;) -- Regards Robin
Re: [PATCH] testsuite: arm: Use effective-target for pr68674.c test
On 2025-02-28 16:06, Richard Earnshaw wrote: On 08/11/2024 17:44, Torbjörn SVENSSON wrote: Ok for trunk and releases/gcc-14? -- gcc/testsuite/ChangeLog: * gcc.target/arm/pr68674.c: Use effective-target arm_arch_v7a and arm_libc_fp_abi. Signed-off-by: Torbjörn SVENSSON --- gcc/testsuite/gcc.target/arm/pr68674.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/gcc.target/arm/pr68674.c b/gcc/testsuite/ gcc.target/arm/pr68674.c index 0b3237458fe..3fd562d0518 100644 --- a/gcc/testsuite/gcc.target/arm/pr68674.c +++ b/gcc/testsuite/gcc.target/arm/pr68674.c @@ -1,9 +1,10 @@ /* PR target/68674 */ /* { dg-do compile } */ -/* { dg-require-effective-target arm_neon_ok } */ -/* { dg-require-effective-target arm_fp_ok } */ +/* { dg-require-effective-target arm_arch_v7a_ok } */ +/* { dg-require-effective-target arm_libc_fp_abi_ok } */ /* { dg-options "-O2" } */ -/* { dg-add-options arm_fp } */ +/* { dg-add-options arm_arch_v7a } */ +/* { dg-add-options arm_libc_fp_abi } */ #pragma GCC target ("fpu=vfp") I've spent some time thinking about this issue. If we change the test in the way you suggest we lose some coverage of other variants where this ought to work, but the alternative is to only run the test on platforms where the base architecture is compatible with neon anyway, so we lose coverage for other builds entirely. So there's no easy answer here. However, the options you've picked above pretty much match the original bug report. I think that's sufficient. OK, and apologies for taking so long pondering this issue. R. Pushed as r15-7815-g879fd9c8226 and r14.2.0-852-ga42ef0938f3. Kind regards, Torbjörn
Re: [PATCH][v2] PR rtl-optimization/119046: Don't mark PARALLEL RTXes with floating-point mode as trapping
Kyrylo Tkachov writes: > Hi all, > > In this testcase late-combine was failing to merge: > dup v31.4s, v31.s[3] > fmla v30.4s, v31.4s, v29.4s > into the lane-wise fmla form. > This is because late-combine checks may_trap_p under the hood on the dup insn. > This ended up returning true for the insn: > (set (reg:V4SF 152 [ _32 ]) > (vec_duplicate:V4SF (vec_select:SF (reg:V4SF 111 [ rhs_panel.8_31 ]) > (parallel:V4SF [ > (const_int 3 [0x3])] > > Although mem_trap_p correctly reasoned that vec_duplicate and vec_select of > floating-point modes can't trap, it assumed that the V4SF parallel can trap. > The correct behaviour is to recurse into vector inside the PARALLEL and check > the sub-expression. This patch adjusts may_trap_p_1 to do just that. > With this check the above insn is not deemed to be trapping and is propagated > into the FMLA giving: > fmla vD.4s, vA.4s, vB.s[3] > > Bootstrapped and tested on aarch64-none-linux-gnu. > Apparently this also fixes a regression in > gcc.target/aarch64/vmul_element_cost.c that I observed. > > Signed-off-by: Kyrylo Tkachov > > gcc/ > > PR rtl-optimization/119046 > * rtlanal.cc (may_trap_p_1): Don't mark FP-mode PARALLELs as trapping. > > gcc/testsuite/ > > PR rtl-optimization/119046 > * gcc.target/aarch64/pr119046.c: New test. > > From 39fa1be73e8fed7fc673329e1854ba41587623c4 Mon Sep 17 00:00:00 2001 > From: Kyrylo Tkachov > Date: Thu, 27 Feb 2025 09:00:25 -0800 > Subject: [PATCH] PR rtl-optimization/119046: Don't mark PARALLEL RTXes with > floating-point mode as trapping > > In this testcase late-combine was failing to merge: > dup v31.4s, v31.s[3] > fmlav30.4s, v31.4s, v29.4s > into the lane-wise fmla form. > This is because late-combine checks may_trap_p under the hood on the dup insn. > This ended up returning true for the insn: > (set (reg:V4SF 152 [ _32 ]) > (vec_duplicate:V4SF (vec_select:SF (reg:V4SF 111 [ rhs_panel.8_31 ]) > (parallel:V4SF [ > (const_int 3 [0x3])] > > Although mem_trap_p correctly reasoned that vec_duplicate and vec_select of > floating-point modes can't trap, it assumed that the V4SF parallel can trap. > The correct behaviour is to recurse into vector inside the PARALLEL and check > the sub-expression. This patch adjusts may_trap_p_1 to do just that. > With this check the above insn is not deemed to be trapping and is propagated > into the FMLA giving: > fmlavD.4s, vA.4s, vB.s[3] > > Bootstrapped and tested on aarch64-none-linux-gnu. > Apparently this also fixes a regression in > gcc.target/aarch64/vmul_element_cost.c that I observed. > > Signed-off-by: Kyrylo Tkachov > > gcc/ > > PR rtl-optimization/119046 > * rtlanal.cc (may_trap_p_1): Don't mark FP-mode PARALLELs as trapping. I don't object to this, if someone else agrees that it's the right fix. But the mode on the parallel looks iffy to me, in that the data is not floating-point data. VOIDmode would probably be more accurate and seems to be what x86 uses. Thanks, Richard > > gcc/testsuite/ > > PR rtl-optimization/119046 > * gcc.target/aarch64/pr119046.c: New test. > --- > gcc/rtlanal.cc | 1 + > gcc/testsuite/gcc.target/aarch64/pr119046.c | 16 > 2 files changed, 17 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr119046.c > > diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc > index 8caffafdaa4..7ad67afb9fe 100644 > --- a/gcc/rtlanal.cc > +++ b/gcc/rtlanal.cc > @@ -3252,6 +3252,7 @@ may_trap_p_1 (const_rtx x, unsigned flags) > return true; >break; > > +case PARALLEL: > case NEG: > case ABS: > case SUBREG: > diff --git a/gcc/testsuite/gcc.target/aarch64/pr119046.c > b/gcc/testsuite/gcc.target/aarch64/pr119046.c > new file mode 100644 > index 000..aa5fa7c848c > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/pr119046.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O2" } */ > + > +#include > + > +float32x4_t madd_helper_1(float32x4_t a, float32x4_t b, float32x4_t d) > +{ > + float32x4_t t = a; > + t = vfmaq_f32 (t, vdupq_n_f32(vgetq_lane_f32 (b, 1)), d); > + t = vfmaq_f32 (t, vdupq_n_f32(vgetq_lane_f32 (b, 1)), d); > + return t; > +} > + > +/* { dg-final { scan-assembler-not {\tdup\tv[0-9]+\.4s, v[0-9]+.s\[1\]\n} } > } */ > +/* { dg-final { scan-assembler-times {\tfmla\tv[0-9]+\.4s, v[0-9]+\.4s, > v[0-9]+\.s\[1\]\n} 2 } } */ > +
Re: AArch64: Enable early scheduling for -O3 and higher (PR118351)
Hi Richard&Kyrill, >> I’m in favour of this. > > Yeah, seems ok to me too. I suppose we ought to update the documentation too: I've added a note to the documentation. However it is impossible to be complete here since many targets switch off early scheduling under various circumstances. So I've just mentioned what x86 and AArch64 do (see v2 below). Cheers, Wilco v2: Update documentation Enable the early scheduler on AArch64 for O3/Ofast. This means GCC15 benefits from much faster build times with -O2, but avoids the regressions in lbm which is very sensitive to minor scheduling changes due to long FMA chains. gcc: PR target/118351 * common/config/aarch64/aarch64-common.cc: Enable early scheduling with -O3 and higher. * doc/invoke.texi (-fschedule-insns): Update comment. --- diff --git a/gcc/common/config/aarch64/aarch64-common.cc b/gcc/common/config/aarch64/aarch64-common.cc index 3d694f16d1fd84e142254a4880c91a7f053e72aa..3044336923415d9414b6c66e66d872612ead24cd 100644 --- a/gcc/common/config/aarch64/aarch64-common.cc +++ b/gcc/common/config/aarch64/aarch64-common.cc @@ -54,8 +54,10 @@ static const struct default_options aarch_option_optimization_table[] = { OPT_LEVELS_FAST, OPT_fomit_frame_pointer, NULL, 1 }, /* Enable -fsched-pressure by default when optimizing. */ { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 }, -/* Disable early scheduling due to high compile-time overheads. */ +/* Except for -O3 and higher, disable early scheduling due to high + compile-time overheads. */ { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 }, +{ OPT_LEVELS_3_PLUS, OPT_fschedule_insns, NULL, 1 }, /* Enable redundant extension instructions removal at -O2 and higher. */ { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_mearly_ra_, NULL, AARCH64_EARLY_RA_ALL }, diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index bad49a017cc1bb0922eba9f0b2db80166d409cd7..5eb2e96c483126ad075b427969e8c02ce3f51e7a 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -13503,7 +13503,9 @@ helps machines that have slow floating point or memory load instructions by allowing other instructions to be issued until the result of the load or floating-point instruction is required. -Enabled at levels @option{-O2}, @option{-O3}. +Enabled at levels @option{-O2}, @option{-O3}. Many targets use a different +default, for example, it is disabled on x86 and enabled only at level +@option{-O3} on AArch64. @opindex fschedule-insns2 @item -fschedule-insns2
[PATCH] c++: disable -Wnonnull in unevaluated context [PR115580]
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? -- >8 -- This PR complains that we issue a -Wnonnull even in a decltype. Since we can't use cp_unevaluated_operand in c-common.cc, this fix targets even -Wformat and -Wrestrict. I think that's fine. PR c++/115580 gcc/cp/ChangeLog: * call.cc (build_over_call): Don't call check_function_arguments when cp_unevaluated_operand. gcc/testsuite/ChangeLog: * g++.dg/warn/Wnonnull16.C: New test. --- gcc/cp/call.cc | 1 + gcc/testsuite/g++.dg/warn/Wnonnull16.C | 16 2 files changed, 17 insertions(+) create mode 100644 gcc/testsuite/g++.dg/warn/Wnonnull16.C diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc index be9b0cf62f1..87ca02507cf 100644 --- a/gcc/cp/call.cc +++ b/gcc/cp/call.cc @@ -10669,6 +10669,7 @@ build_over_call (struct z_candidate *cand, int flags, tsubst_flags_t complain) bool warned_p = false; if ((complain & tf_warning) + && !cp_unevaluated_operand && (warn_nonnull || warn_format || warn_suggest_attribute_format diff --git a/gcc/testsuite/g++.dg/warn/Wnonnull16.C b/gcc/testsuite/g++.dg/warn/Wnonnull16.C new file mode 100644 index 000..8740f351ac7 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Wnonnull16.C @@ -0,0 +1,16 @@ +// PR c++/115580 +// { dg-do compile { target c++11 } } + +class WithMember { +public: + int foo(); +}; + +decltype(((WithMember*)nullptr)->foo()) footype; // { dg-bogus "pointer is null" } + +int f(void*) __attribute__((nonnull)); + +void g() +{ + [[maybe_unused]] decltype(f(nullptr)) b; // { dg-bogus "non-null" } +} base-commit: 879fd9c822633ecf2c62471d1a7f9b9619e296b7 -- 2.48.1
[PATCH] i386: Correct mask width for bf8->fp16 intrin on 256/512 bit
Hi all, For bf8 -> pf16 convert, when dst is 256 bit, the mask should be 16 bit since 16*16=256, not the 8 bit in the current intrin. In 512 bit intrin, the mask bit is also halved. This patch will fix both of them. Ok for trunk? Thx, Haochen gcc/ChangeLog: * config/i386/avx10_2-512convertintrin.h (_mm512_mask_cvtbf8_ph): Correct mask width. (_mm512_maskz_cvtbf8_ph): Ditto. * config/i386/avx10_2convertintrin.h (_mm256_mask_cvtbf8_ph): Ditto. (_mm256_maskz_cvtbf8_ph): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-512-convert-1.c: Change function call. * gcc.target/i386/avx10_2-convert-1.c: Ditto. --- gcc/config/i386/avx10_2-512convertintrin.h| 4 ++-- gcc/config/i386/avx10_2convertintrin.h| 4 ++-- gcc/testsuite/gcc.target/i386/avx10_2-512-convert-1.c | 4 ++-- gcc/testsuite/gcc.target/i386/avx10_2-convert-1.c | 4 ++-- 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/gcc/config/i386/avx10_2-512convertintrin.h b/gcc/config/i386/avx10_2-512convertintrin.h index 1079e0a2bda..a44481e0b4e 100644 --- a/gcc/config/i386/avx10_2-512convertintrin.h +++ b/gcc/config/i386/avx10_2-512convertintrin.h @@ -550,7 +550,7 @@ _mm512_cvtbf8_ph (__m256i __A) extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtbf8_ph (__m512h __S, __mmask16 __U, __m256i __A) +_mm512_mask_cvtbf8_ph (__m512h __S, __mmask32 __U, __m256i __A) { return (__m512h) _mm512_castsi512_ph ((__m512i) _mm512_mask_slli_epi16 ( (__m512i) __S, __U, (__m512i) _mm512_cvtepi8_epi16 (__A), 8)); @@ -558,7 +558,7 @@ _mm512_mask_cvtbf8_ph (__m512h __S, __mmask16 __U, __m256i __A) extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtbf8_ph (__mmask16 __U, __m256i __A) +_mm512_maskz_cvtbf8_ph (__mmask32 __U, __m256i __A) { return (__m512h) _mm512_castsi512_ph ((__m512i) _mm512_slli_epi16 ( (__m512i) _mm512_maskz_cvtepi8_epi16 (__U, __A), 8)); diff --git a/gcc/config/i386/avx10_2convertintrin.h b/gcc/config/i386/avx10_2convertintrin.h index 3fc51b17435..7c9c238a3b4 100644 --- a/gcc/config/i386/avx10_2convertintrin.h +++ b/gcc/config/i386/avx10_2convertintrin.h @@ -1004,7 +1004,7 @@ _mm256_cvtbf8_ph (__m128i __A) extern __inline __m256h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm256_mask_cvtbf8_ph (__m256h __S, __mmask8 __U, __m128i __A) +_mm256_mask_cvtbf8_ph (__m256h __S, __mmask16 __U, __m128i __A) { return (__m256h) _mm256_castsi256_ph ((__m256i) _mm256_mask_slli_epi16 ( (__m256i) __S, __U, (__m256i) _mm256_cvtepi8_epi16 (__A), 8)); @@ -1012,7 +1012,7 @@ _mm256_mask_cvtbf8_ph (__m256h __S, __mmask8 __U, __m128i __A) extern __inline __m256h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm256_maskz_cvtbf8_ph (__mmask8 __U, __m128i __A) +_mm256_maskz_cvtbf8_ph (__mmask16 __U, __m128i __A) { return (__m256h) _mm256_castsi256_ph ((__m256i) _mm256_slli_epi16 ( (__m256i) _mm256_maskz_cvtepi8_epi16 (__U, __A), 8)); diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-convert-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-convert-1.c index bda74b5776b..c1e44efdb2f 100644 --- a/gcc/testsuite/gcc.target/i386/avx10_2-512-convert-1.c +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-convert-1.c @@ -183,6 +183,6 @@ void extern avx10_2_512_cvtbf8_fp16_test (void) { y = _mm512_cvtbf8_ph (z1); - y = _mm512_mask_cvtbf8_ph (z, m16, z1); - y = _mm512_maskz_cvtbf8_ph (m16, z1); + y = _mm512_mask_cvtbf8_ph (z, m32, z1); + y = _mm512_maskz_cvtbf8_ph (m32, z1); } diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-convert-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-convert-1.c index 57b5fce7fb6..729496f7173 100644 --- a/gcc/testsuite/gcc.target/i386/avx10_2-convert-1.c +++ b/gcc/testsuite/gcc.target/i386/avx10_2-convert-1.c @@ -289,6 +289,6 @@ avx10_2_cvtbf8_fp16_test (void) y = _mm_maskz_cvtbf8_ph (m8, z3); y2 = _mm256_cvtbf8_ph (z3); - y2 = _mm256_mask_cvtbf8_ph (z2, m8, z3); - y2 = _mm256_maskz_cvtbf8_ph (m8, z3); + y2 = _mm256_mask_cvtbf8_ph (z2, m16, z3); + y2 = _mm256_maskz_cvtbf8_ph (m16, z3); } -- 2.31.1
Re: [PATCH] c++: Fix checking assert upon invalid class definition [PR116740]
On 2/18/25 8:00 AM, Simon Martin wrote: A checking assert triggers upon the following invalid code since GCC 11: === cut here === class { a (struct b; } struct b === cut here === The problem is that during error recovery, we call set_identifier_type_value_with_scope for B in the global namespace, and the checking assert added via r11-7228-g8f93e1b892850b fails. This patch relaxes that assert to not fail if we've seen a parser error (it a generalization of another fix done to that checking assert via r11-7266-g24bf79f1798ad1). Successfully tested on x86_64-pc-linux-gnu. OK. PR c++/116740 gcc/cp/ChangeLog: * name-lookup.cc (set_identifier_type_value_with_scope): Don't fail assert with ill-formed input. gcc/testsuite/ChangeLog: * g++.dg/parse/crash80.C: New test. --- gcc/cp/name-lookup.cc| 6 ++ gcc/testsuite/g++.dg/parse/crash80.C | 7 +++ 2 files changed, 9 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/g++.dg/parse/crash80.C diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc index d1abb205bc7..742e5d289dc 100644 --- a/gcc/cp/name-lookup.cc +++ b/gcc/cp/name-lookup.cc @@ -5101,10 +5101,8 @@ set_identifier_type_value_with_scope (tree id, tree decl, cp_binding_level *b) if (b->kind == sk_namespace) /* At namespace scope we should not see an identifier type value. */ gcc_checking_assert (!REAL_IDENTIFIER_TYPE_VALUE (id) -/* We could be pushing a friend underneath a template - parm (ill-formed). */ -|| (TEMPLATE_PARM_P -(TYPE_NAME (REAL_IDENTIFIER_TYPE_VALUE (id); +/* But we might end up here with ill-formed input. */ +|| seen_error ()); else { /* Push the current type value, so we can restore it later */ diff --git a/gcc/testsuite/g++.dg/parse/crash80.C b/gcc/testsuite/g++.dg/parse/crash80.C new file mode 100644 index 000..cd9216adf5c --- /dev/null +++ b/gcc/testsuite/g++.dg/parse/crash80.C @@ -0,0 +1,7 @@ +// PR c++/116740 +// { dg-do "compile" } + +class K { + int a(struct b; // { dg-error "expected '\\)'" } +}; +struct b {};
[pushed] c++: C++23 range-for temps and ?: [PR119073]
Tested x86_64-pc-linux-gnu, applying to trunk. -- 8< -- Here gimplification got confused because extend_temps_r messed up the types of the arms of a COND_EXPR. PR c++/119073 gcc/cp/ChangeLog: * call.cc (extend_temps_r): Preserve types of COND_EXPR arms. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/range-for39.C: New test. --- gcc/cp/call.cc | 2 +- gcc/testsuite/g++.dg/cpp0x/range-for39.C | 12 2 files changed, 13 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/cpp0x/range-for39.C diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc index be9b0cf62f1..f7b4cccb1c7 100644 --- a/gcc/cp/call.cc +++ b/gcc/cp/call.cc @@ -14902,7 +14902,7 @@ extend_temps_r (tree *tp, int *walk_subtrees, void *data) { tree set = build2 (MODIFY_EXPR, boolean_type_node, cur_cond_guard, boolean_true_node); - op = add_stmt_to_compound (set, op); + op = cp_build_compound_expr (set, op, tf_none); } }; walk_arm (TREE_OPERAND (*tp, 1)); diff --git a/gcc/testsuite/g++.dg/cpp0x/range-for39.C b/gcc/testsuite/g++.dg/cpp0x/range-for39.C new file mode 100644 index 000..ebb6acafe7f --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/range-for39.C @@ -0,0 +1,12 @@ +// PR c++/119073 +// { dg-do compile { target c++11 } } + +struct A { ~A (); }; +struct B { B (const A &a = A ()); int *begin (); int *end (); ~B (); }; + +void +foo (bool x) +{ + for (auto i : (x ? B{} : B{})) +; +} base-commit: 8d7762945bba3c8321b5856cee3d5e5aed55facf -- 2.48.1
Re: [PATCH] c++: Use capture from outer lambda, if any, instead of erroring out [PR110584]
On 2/18/25 5:12 AM, Simon Martin wrote: We've been rejecting this valid code since r8-4571: === cut here === void foo (float); int main () { constexpr float x = 0; (void) [&] () { foo (x); (void) [] () { foo (x); }; }; } === cut here === The problem is that when processing X in the inner lambda, process_outer_var_ref errors out even though it does find the capture from the enclosing lambda. This patch changes process_outer_var_ref to accept and return the outer proxy if it finds any. Successfully tested on x86_64-pc-linux-gnu. PR c++/110584 gcc/cp/ChangeLog: * semantics.cc (process_outer_var_ref): Use capture from enclosing lambda, if any. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/lambda/lambda-nested10.C: New test. --- gcc/cp/semantics.cc | 4 ++ .../g++.dg/cpp0x/lambda/lambda-nested10.C | 46 +++ 2 files changed, 50 insertions(+) create mode 100644 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested10.C diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc index 7c7d3e3c432..7bbc82f7dc1 100644 --- a/gcc/cp/semantics.cc +++ b/gcc/cp/semantics.cc @@ -4598,6 +4598,10 @@ process_outer_var_ref (tree decl, tsubst_flags_t complain, bool odr_use) if (!odr_use && context == containing_function) decl = add_default_capture (lambda_stack, /*id=*/DECL_NAME (decl), initializer); + /* When doing lambda capture, if we found a capture in an enclosing lambda, + we can use it. */ + else if (!odr_use && is_capture_proxy (decl)) +return decl; What makes this OK is not that it's a capture, but that the captured variable is constant, so it should have satisfied this branch: /* Only an odr-use of an outer automatic variable causes an error, and a constant variable can decay to a prvalue constant without odr-use. So don't complain yet. */ else if (!odr_use && decl_constant_var_p (decl)) return decl; is_constant_capture_proxy might be useful here. And/or a new strip_normal_capture function (that I'm surprised doesn't exist yet). Jason
Re: [PATCH] c++: Apply/diagnose attributes when instatiating ARRAY/POINTER/REFERENCE_TYPE [PR118787]
On 2/11/25 1:04 PM, Jakub Jelinek wrote: Hi! The following testcase IMO in violation of the P2552R3 paper doesn't pedwarn on alignas applying to dependent types or alignas with dependent argument. tsubst was just ignoring TYPE_ATTRIBUTES. The following patch fixes it for the POINTER/REFERENCE_TYPE and ARRAY_TYPE cases, but perhaps we need to do the same also for other types (INTEGER_TYPE/REAL_TYPE and the like). I guess I'll need to construct more testcases. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. 2025-02-11 Jakub Jelinek PR c++/118787 * pt.cc (tsubst) : Use return t; only if it doesn't have any TYPE_ATTRIBUTES. Call apply_late_template_attributes. : Likewise. Formatting fix. * g++.dg/cpp0x/alignas22.C: New test. --- gcc/cp/pt.cc.jj 2025-02-07 17:03:13.560227281 +0100 +++ gcc/cp/pt.cc2025-02-10 17:17:47.65131 +0100 @@ -16854,7 +16854,9 @@ tsubst (tree t, tree args, tsubst_flags_ case POINTER_TYPE: case REFERENCE_TYPE: { - if (type == TREE_TYPE (t) && TREE_CODE (type) != METHOD_TYPE) + if (type == TREE_TYPE (t) + && TREE_CODE (type) != METHOD_TYPE + && TYPE_ATTRIBUTES (t) == NULL_TREE) return t; /* [temp.deduct] @@ -16924,9 +16926,9 @@ tsubst (tree t, tree args, tsubst_flags_ A,' while an attempt to create the type type rvalue reference to cv T' creates the type T" */ - r = cp_build_reference_type - (TREE_TYPE (type), - TYPE_REF_IS_RVALUE (t) && TYPE_REF_IS_RVALUE (type)); + r = cp_build_reference_type (TREE_TYPE (type), + TYPE_REF_IS_RVALUE (t) + && TYPE_REF_IS_RVALUE (type)); else r = cp_build_reference_type (type, TYPE_REF_IS_RVALUE (t)); r = cp_build_qualified_type (r, cp_type_quals (t), complain); @@ -16935,6 +16937,11 @@ tsubst (tree t, tree args, tsubst_flags_ /* Will this ever be needed for TYPE_..._TO values? */ layout_type (r); + if (!apply_late_template_attributes (&r, TYPE_ATTRIBUTES (t), +/*flags=*/0, +args, complain, in_decl)) + return error_mark_node; + return r; } case OFFSET_TYPE: @@ -17009,7 +17016,9 @@ tsubst (tree t, tree args, tsubst_flags_ /* As an optimization, we avoid regenerating the array type if it will obviously be the same as T. */ - if (type == TREE_TYPE (t) && domain == TYPE_DOMAIN (t)) + if (type == TREE_TYPE (t) + && domain == TYPE_DOMAIN (t) + && TYPE_ATTRIBUTES (t) == NULL_TREE) return t; /* These checks should match the ones in create_array_type_for_decl. @@ -17048,6 +17057,11 @@ tsubst (tree t, tree args, tsubst_flags_ TYPE_USER_ALIGN (r) = 1; } + if (!apply_late_template_attributes (&r, TYPE_ATTRIBUTES (t), +/*flags=*/0, +args, complain, in_decl)) + return error_mark_node; + return r; } --- gcc/testsuite/g++.dg/cpp0x/alignas22.C.jj 2025-02-10 17:33:16.242452750 +0100 +++ gcc/testsuite/g++.dg/cpp0x/alignas22.C 2025-02-10 17:36:28.739046629 +0100 @@ -0,0 +1,23 @@ +// PR c++/118787 +// { dg-do compile { target c++11 } } +// { dg-options "-pedantic" } + +template +void foo (T & alignas (N));// { dg-warning "'alignas' on a type other than class" } +template +void bar (T (&)[N] alignas (N)); // { dg-warning "'alignas' on a type other than class" } +template +using U = T * alignas (N); // { dg-warning "'alignas' on a type other than class" } +template +using V = T[N] alignas (N);// { dg-warning "'alignas' on a type other than class" } + +void +baz () +{ + int x alignas (4) = 0; + foo (x); + int y alignas (4) [4]; + bar (y); + U u; + V v; +} Jakub
Re: [PATCH] c++: Don't replace INDIRECT_REFs by a const capture proxy too eagerly [PR117504]
On 2/14/25 12:08 PM, Simon Martin wrote: We have been miscompiling the following valid code since GCC8, and r8-3497-g281e6c1d8f1b4c === cut here === struct span { span (const int (&__first)[1]) : _M_ptr (__first) {} int operator[] (long __i) { return _M_ptr[__i]; } const int *_M_ptr; }; void foo () { constexpr int a_vec[]{1}; auto vec{[&a_vec]() -> span { return a_vec; }()}; } === cut here === The problem is that perform_implicit_conversion_flags (via mark_rvalue_use) replaces "a_vec" in the return statement by a CONSTRUCTOR representing a_vec's constant value, and then takes its address when invoking span's constructor. So we end up with an instance that points to garbage instead of a_vec's storage. I've tried many things to somehow recover from this replacement, but I actually think we should not do it when converting to a class type: we have no idea whether the conversion will involve a constructor taking an address or reference. So we should assume it's the case, and call mark_lvalue_use, not mark_rvalue_use (I might very weel be overseeing things, and feedback is more than welcome). Yeah, those mark_*_use calls seem misplaced, they should be called instead by the code that actually does the conversion. What if we replace all of them here with just mark_exp_read? Or nothing? This is what the patch does, successfully tested on x86_64-pc-linux-gnu. PR c++/117504 gcc/cp/ChangeLog: * call.cc (perform_implicit_conversion_flags): When possibly converting to a class, call mark_lvalue_use. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/constexpr-117504.C: New test. * g++.dg/cpp2a/constexpr-117504a.C: New test. --- gcc/cp/call.cc| 4 ++ gcc/testsuite/g++.dg/cpp2a/constexpr-117504.C | 60 +++ .../g++.dg/cpp2a/constexpr-117504a.C | 12 3 files changed, 76 insertions(+) create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-117504.C create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-117504a.C diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc index 38a8f7fdcda..097b1fa55a4 100644 --- a/gcc/cp/call.cc +++ b/gcc/cp/call.cc @@ -13973,6 +13973,10 @@ perform_implicit_conversion_flags (tree type, tree expr, if (TYPE_REF_P (type)) expr = mark_lvalue_use (expr); + else if (MAYBE_CLASS_TYPE_P (type)) +/* We might convert using a constructor that takes the address of EXPR, so + assume that it will be the case. */ +expr = mark_lvalue_use (expr); else expr = mark_rvalue_use (expr); diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-117504.C b/gcc/testsuite/g++.dg/cpp2a/constexpr-117504.C new file mode 100644 index 000..290d3dfd61e --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-117504.C @@ -0,0 +1,60 @@ +// PR c++/117504 - Initial report +// { dg-do "run" { target c++20 } } + +struct span { + span (const int (&__first)[1]) : _M_ptr (__first) {} + int operator[] (long __i) { return _M_ptr[__i]; } + const int *_M_ptr; +}; + +constexpr int a_global_vec[]{1}; +span myFunctor() { + return a_global_vec; +} + +int main() { + constexpr int a_vec[]{1}; + + // + // This PR's case, that used to be miscompiled. + // + auto lambda_1 = [&a_vec] () -> span { return a_vec; }; + auto vec_1 { lambda_1 () }; + if (vec_1[0] != 1) +__builtin_abort (); + + // Variant that used to be miscompiled as well. + auto lambda_2 = [&] () -> span { return a_vec; }; + auto vec_2 { lambda_2 () }; + if (vec_2[0] != 1) +__builtin_abort (); + + // + // Related cases that worked already. + // + auto lambda_3 = [&a_vec] () /* -> span */ { return a_vec; }; + auto vec_3 { lambda_3 () }; + if (vec_3[0] != 1) +__builtin_abort (); + + auto lambda_4 = [&] () /* -> span */ { return a_vec; }; + auto vec_4 { lambda_4 () }; + if (vec_4[0] != 1) +__builtin_abort (); + + const int (&vec_5)[1] = a_vec; + if (vec_5[0] != 1) +__builtin_abort (); + + span vec_6 (a_vec); + if (vec_6[0] != 1) +__builtin_abort (); + + auto vec_7 = myFunctor (); + if (vec_7[0] != 1) +__builtin_abort (); + + const int (&vec_8)[1] { a_vec }; + if (vec_8[0] != 1) +__builtin_abort (); +} diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-117504a.C b/gcc/testsuite/g++.dg/cpp2a/constexpr-117504a.C new file mode 100644 index 000..f6d4dc8cbc5 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-117504a.C @@ -0,0 +1,12 @@ +// PR c++/117504 - ICE discovered by ppalka@ when reducing. +// { dg-do "compile" { target c++20 } } + +struct span { + span (const int* __first) : _M_ptr (__first) {} + int operator[] (long __i) { return _M_ptr[__i]; } + const int *_M_ptr; +}; +int main() { + constexpr int a_vec[]{1}; + auto vec { [&a_vec]() -> span { return a_vec; } () }; +}
[committed] openmp, c++: Fix up OpenMP/OpenACC handling in C++ modules [PR119102]
Hi! modules.cc has apparently support for extensions and attempts to ensure that if a module is compiled with those extensions enabled, sources which use the module are compiled with the same extensions. The only extension supported is SE_OPENMP right now. And the use of the extension is keyed on streaming out or in OMP_CLAUSE tree. This is undesirable for several reasons. OMP_CLAUSE is the only tree which can appear in the IL even without -fopenmp/-fopenmp-simd/-fopenacc (when simd ("notinbranch") or simd ("inbranch") attributes are used), and it can appear also in all the 3 modes mentioned above. On the other side, with the exception of arguments of attributes added e.g. for declare simd where no harm should be done if -fopenmp/-fopenmp-simd isn't enabled later on, OMP_CLAUSE appears in OMP_*_CLAUSES of OpenMP/OpenACC construct trees. And those construct trees often have no clauses at all, so keying the extension on OMP_CLAUSE doesn't catch many cases that should be caught. Furthermore, for OpenMP we have 2 modes, -fopenmp-simd which parses some OpenMP but constructs from that mostly OMP_SIMD and a few other cases, and -fopenmp which includes that and far more on top of that; and there is also -fopenacc. So, this patch stops setting/requesting the extension on OMP_CLAUSE, introduces 3 extensions rather than one (SE_OPENMP_SIMD, SE_OPENMP and SE_OPENACC) and keyes those on OpenMP constructs from the -fopenmp-simd subset, other OpenMP constructs and OpenACC constructs. 2025-03-05 Jakub Jelinek PR c++/119102 gcc/cp/ * module.cc (enum streamed_extensions): Add SE_OPENMP_SIMD and SE_OPENACC, change value of SE_OPENMP and SE_BITS. (CASE_OMP_SIMD_CODE, CASE_OMP_CODE, CASE_OACC_CODE): Define. (trees_out::start): Don't set SE_OPENMP extension for OMP_CLAUSE. Set SE_OPENMP_SIMD extension for CASE_OMP_SIMD_CODE, SE_OPENMP for CASE_OMP_CODE and SE_OPENACC for CASE_OACC_CODE. (trees_in::start): Don't fail for OMP_CLAUSE with missing SE_OPENMP extension. Do fail for CASE_OMP_SIMD_CODE and missing SE_OPENMP_SIMD extension, or CASE_OMP_CODE and missing SE_OPENMP extension, or CASE_OACC_CODE and missing SE_OPENACC extension. (module_state::write_readme): Write all of SE_OPENMP_SIMD, SE_OPENMP and SE_OPENACC extensions. (module_state::read_config): Diagnose missing -fopenmp, -fopenmp-simd and/or -fopenacc depending on extensions used. gcc/testsuite/ * g++.dg/modules/pr119102_a.H: New test. * g++.dg/modules/pr119102_b.C: New test. * g++.dg/modules/omp-3_a.C: New test. * g++.dg/modules/omp-3_b.C: New test. * g++.dg/modules/omp-3_c.C: New test. * g++.dg/modules/omp-3_d.C: New test. * g++.dg/modules/oacc-1_a.C: New test. * g++.dg/modules/oacc-1_b.C: New test. * g++.dg/modules/oacc-1_c.C: New test. --- gcc/cp/module.cc.jj 2025-03-03 17:36:37.263748933 +0100 +++ gcc/cp/module.cc2025-03-04 17:52:16.439987517 +0100 @@ -3613,8 +3613,10 @@ void slurping::release_macros () /* Flags for extensions that end up being streamed. */ enum streamed_extensions { - SE_OPENMP = 1 << 0, - SE_BITS = 1 + SE_OPENMP_SIMD = 1 << 0, + SE_OPENMP = 1 << 1, + SE_OPENACC = 1 << 2, + SE_BITS = 3 }; /* Counter indices. */ @@ -5276,6 +5278,53 @@ trees_in::tree_list (bool has_purpose) return res; } + +#define CASE_OMP_SIMD_CODE \ +case OMP_SIMD: \ +case OMP_STRUCTURED_BLOCK: \ +case OMP_LOOP: \ +case OMP_ORDERED: \ +case OMP_TILE: \ +case OMP_UNROLL +#define CASE_OMP_CODE \ +case OMP_PARALLEL: \ +case OMP_TASK: \ +case OMP_FOR: \ +case OMP_DISTRIBUTE: \ +case OMP_TASKLOOP: \ +case OMP_TEAMS:\ +case OMP_TARGET_DATA: \ +case OMP_TARGET: \ +case OMP_SECTIONS: \ +case OMP_CRITICAL: \ +case OMP_SINGLE: \ +case OMP_SCOPE:\ +case OMP_TASKGROUP:\ +case OMP_MASKED: \ +case OMP_DISPATCH: \ +case OMP_INTEROP: \ +case OMP_MASTER: \ +case OMP_TARGET_UPDATE:\ +case OMP_TARGET_ENTER_DATA:\ +case OMP_TARGET_EXIT_DATA: \ +case OMP_METADIRECTIVE:\ +case OMP_ATOMIC: \ +case OMP_ATOMIC_READ: \ +case OMP_ATOMIC_CAPTURE_OLD: \ +case OMP_ATOMIC_CAPTURE_NEW +#define CASE_OACC_CODE \ +case OACC_PARALLEL:\ +case OACC_KERNELS: \ +case OACC_SERIAL: \ +case OACC_DATA:\ +case OACC_HOS
Re: [PATCH] RISC-V: Adjust LMUL when using maximum SEW [PR117955].
On Fri, 28 Feb 2025 12:48:36 +0100, "Robin Dapp" wrote: > > Okay, let me explain the background of my previous patch. > > > > Prior to applying my patch, for the test case bug-10.c (a reduced example > > of > > a larger program with incorrect runtime results), > > the vsetvli sequence compiled with --param=vsetvl-strategy=simple was as > > follows: > > 1. vsetvli zero,a4,e16,m4,ta,ma + vsetvli zero,a4,e32,m8,ta,ma + vsetvli > > zero,a4,e8,m2,ta,ma > > > > The vsetvli sequence compiled with --param=vsetvl-strategy=optim was as > > follows: > > 2. vsetvli zero,a4,e32,m4,ta,ma + vsetvli zero,zero,e8,m2,ta,ma > > > Although vl remains unchanged, the SEW/LMUL ratio in sequence 2 changes, > > leading to undefined behavior. > > The only difference I see with your patch vs without is > > < vsetvli zero,zero,e8,m2,ta,ma > --- > > vsetvli zero,a3,e8,m2,ta,ma > > and we ensure the former doesn't occur in the test. > > But that difference doesn't matter because the ratio is the same before and > after. That's why I'm asking. bug-10.c as is doesn't test anything > reasonable > IMHO. Right, the ratio (or rather the associated LMUL) was wrong but the > current test doesn't make sure it isn't. Can you share the non-reduced (or > less reduced) case? Hi Robin, I apologize for the delayed response. I spent quite a bit of time trying to reproduce the case, and given the passage of time, it wasn't easy to refine the testing. Fortunately, you can see the results here. https://godbolt.org/z/Mc8veW7oT Using GCC version 14.2.0 should allow you to replicate the issue. If all goes as expected, you will encounter a "Segmentation fault (core dumped)." By disassembling the binary, you'll notice the presence of "vsetvli zero,zero,e32,m4,ta,ma", which is where the problem lies, just as I mentioned previously. Best regards, Jin Ma > > /* { dg-do compile { target { rv64 } } } */ > > /* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3" } */ > > > > #include > > > > _Float16 a (uint64_t); > > int8_t b () { > > int c = 100; > > double *d; > > _Float16 *e; > > for (size_t f;; c -= f) > > { > > f = c; > > __riscv_vsll_vx_u8mf8 (__riscv_vid_v_u8mf8 (f), 2, f); > > vfloat16mf4_t g; > > a (1); > > g = __riscv_vfmv_s_f_f16mf4 (2, f); > > vfloat64m1_t i = __riscv_vfmv_s_f_f64m1 (30491, f); > > vuint16mf4_t j; > > __riscv_vsoxei16_v_f16mf4 (e, j, g, f); > > vuint8mf8_t k = __riscv_vsll_vx_u8mf8 (__riscv_vid_v_u8mf8 (f), 3, f); > > __riscv_vsoxei8_v_f64m1 (d, k, i, f); > > } > > } > > > > /* { dg-final { scan-assembler-not "e64,mf4" } } */ > > That works, thanks. > > -- > Regards > Robin
[committed] c++: Fix a comment typo
Hi! During the 118874 coro investigation I found a typo in a comment. Fixed thusly, committed to trunk as obvious. 2025-03-05 Jakub Jelinek * typeck.cc (check_return_expr): Fix comment typo, rom -> from. --- gcc/cp/typeck.cc.jj 2025-02-25 09:26:38.761944513 +0100 +++ gcc/cp/typeck.cc2025-03-04 12:19:41.495160309 +0100 @@ -11450,7 +11450,7 @@ check_return_expr (tree retval, bool *no bool converted = false; tree moved; /* Until C++23, this was only interesting for class type, but in C++23, -we should do the below when we're converting rom/to a class/reference +we should do the below when we're converting from/to a class/reference (a non-scalar type). */ if ((cxx_dialect < cxx23 ? CLASS_TYPE_P (functype) Jakub
[pushed] Fix folding of BIT_NOT_EXPR for POLY_INT_CST [PR118976]
There was an embarrassing typo in the folding of BIT_NOT_EXPR for POLY_INT_CSTs: it used - rather than ~ on the poly_int. Not sure how that happened, but it might have been due to the way that ~x is implemented as -1 - x internally. Tested on aarch64-linux-gnu & powerpc64le-linux-gnu (as part of other testing). Pushed as obvious. I'll backport to all open branches. Richard gcc/ PR tree-optimization/118976 * fold-const.cc (const_unop): Use ~ rather than - for BIT_NOT_EXPR. * config/aarch64/aarch64.cc (aarch64_test_sve_folding): New function. (aarch64_run_selftests): Run it. --- gcc/config/aarch64/aarch64.cc | 11 +++ gcc/fold-const.cc | 2 +- 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index fe76730b0a7..af3871ce8a1 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -31336,6 +31336,16 @@ aarch64_test_sysreg_encoding_clashes (void) } } +/* Test SVE arithmetic folding. */ + +static void +aarch64_test_sve_folding () +{ + tree res = fold_unary (BIT_NOT_EXPR, ssizetype, +ssize_int (poly_int64 (1, 1))); + ASSERT_TRUE (operand_equal_p (res, ssize_int (poly_int64 (-2, -1; +} + /* Run all target-specific selftests. */ static void @@ -31344,6 +31354,7 @@ aarch64_run_selftests (void) aarch64_test_loading_full_dump (); aarch64_test_fractional_cost (); aarch64_test_sysreg_encoding_clashes (); + aarch64_test_sve_folding (); } } // namespace selftest diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index f9f7f4d2f91..fef7a6cc48e 100644 --- a/gcc/fold-const.cc +++ b/gcc/fold-const.cc @@ -1964,7 +1964,7 @@ const_unop (enum tree_code code, tree type, tree arg0) if (TREE_CODE (arg0) == INTEGER_CST) return fold_not_const (arg0, type); else if (POLY_INT_CST_P (arg0)) - return wide_int_to_tree (type, -poly_int_cst_value (arg0)); + return wide_int_to_tree (type, ~poly_int_cst_value (arg0)); /* Perform BIT_NOT_EXPR on each element individually. */ else if (TREE_CODE (arg0) == VECTOR_CST) { -- 2.25.1
Re: [PATCH] simplify-rtx: Fix up simplify_logical_relational_operation [PR119002]
On Tue, Mar 04, 2025 at 09:26:07AM +, Richard Sandiford wrote: > gcc/ > PR rtl-optimization/119002 > * simplify-rtx.cc > (simplify_context::simplify_logical_relational_operation): Handle > comparisons between CC values. If there is no evidence that the > CC values are unsigned, restrict the fold to always-true or > always-false results. > > gcc/testsuite/ > * gcc.c-torture/execute/ieee/pr119002.c: New test. > * gcc.target/aarch64/pr117186.c: Run at -O2 rather than -O. LGTM. Jakub
Re: [PATCH] simplify-rtx: Fix up simplify_logical_relational_operation [PR119002]
Richard Sandiford writes: > Jakub Jelinek writes: >> On Mon, Mar 03, 2025 at 01:46:20PM +, Richard Sandiford wrote: >>> Jakub Jelinek writes: >>> > On Mon, Mar 03, 2025 at 01:02:07PM +, Richard Sandiford wrote: >>> >> ...how about something like this? Completely untested, and I haven't >>> >> thought about it much. Just didn't want to hold up the discussion. >>> > >>> > Works for me. >>> > >>> > Just wonder if there is anything that will actually verify that XEXP >>> > (op0, 0) >>> > and XEXP (op1, 0) modes are at least from the same class, rather than say >>> > have one of the comparisons in MODE_CC and another in MODE_INT or vice >>> > versa >>> > or whatever other modes. >>> >>> There's: >>> >>> if (!(rtx_equal_p (XEXP (op0, 0), XEXP (op1, 0)) >>> && rtx_equal_p (XEXP (op0, 1), XEXP (op1, 1 >>> return 0; >> >> You're right, and rtx_equal_p returns false for GET_MODE differences. >> >> Will you test your patch (with the testcase from my patch) or should I? > > I'm just about to test it with a tweak to the mode check. Should be > done in a couple of hours. Heh. Forgot when I said that that I would need to test on powerpc64le as well, which isn't something I do regularly, and so took a couple of gos to get right. I've now bootstrapped & regression-tested on both powerpc64le-linux-gnu and aarch64-linux-gnu. OK to install? Richard The following testcase is miscompiled on powerpc64le-linux starting with r15-6777. During combine we see: (set (reg:SI 134) (ior:SI (ge:SI (reg:CCFP 128) (const_int 0 [0])) (lt:SI (reg:CCFP 128) (const_int 0 [0] The simplify_logical_relational_operation code (in its current form) was written with arithmetic rather than CC modes in mind. Since CCFP is a CC mode, it fails the HONOR_NANS check, and so the function assumes that ge | lt => true. If one comparison is unsigned then it should be safe to assume that the other comparison is also unsigned, even for CC modes, since the optimisation checks that the comparisons are between the same operands. For the other cases, we can only safely fold comparisons of CC mode values if the result is always-true (15) or always-false (0). It turns out that the original testcase for PR117186, which ran at -O, was relying on the old behaviour for some of the functions. It needs 4-instruction combinations, and so -fexpensive-optimizations, to pass in its intended form. gcc/ PR rtl-optimization/119002 * simplify-rtx.cc (simplify_context::simplify_logical_relational_operation): Handle comparisons between CC values. If there is no evidence that the CC values are unsigned, restrict the fold to always-true or always-false results. gcc/testsuite/ * gcc.c-torture/execute/ieee/pr119002.c: New test. * gcc.target/aarch64/pr117186.c: Run at -O2 rather than -O. Co-authored-by: Jakub Jelinek --- gcc/simplify-rtx.cc | 12 -- .../gcc.c-torture/execute/ieee/pr119002.c | 23 +++ gcc/testsuite/gcc.target/aarch64/pr117186.c | 2 +- 3 files changed, 34 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.c-torture/execute/ieee/pr119002.c diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index c478bd060fc..fe007bc7d96 100644 --- a/gcc/simplify-rtx.cc +++ b/gcc/simplify-rtx.cc @@ -2655,6 +2655,9 @@ simplify_context::simplify_logical_relational_operation (rtx_code code, enum rtx_code code0 = GET_CODE (op0); enum rtx_code code1 = GET_CODE (op1); + machine_mode cmp_mode = GET_MODE (XEXP (op0, 0)); + if (cmp_mode == VOIDmode) +cmp_mode = GET_MODE (XEXP (op0, 1)); /* Assume at first that the comparisons are on integers, and that the operands are therefore ordered. */ @@ -2672,8 +2675,10 @@ simplify_context::simplify_logical_relational_operation (rtx_code code, } else { - /* See whether the operands might be unordered. */ - if (HONOR_NANS (GET_MODE (XEXP (op0, 0 + /* See whether the operands might be unordered. Assume that all +results are possible for CC modes, and punt later if we don't get an +always-true or always-false answer. */ + if (GET_MODE_CLASS (cmp_mode) == MODE_CC || HONOR_NANS (cmp_mode)) all = 15; mask0 = comparison_to_mask (code0) & all; mask1 = comparison_to_mask (code1) & all; @@ -2702,6 +2707,9 @@ simplify_context::simplify_logical_relational_operation (rtx_code code, code = mask_to_unsigned_comparison (mask); else { + if (GET_MODE_CLASS (cmp_mode) == MODE_CC) + return 0; + code = mask_to_comparison (mask); /* LTGT and NE are arithmetically equivalent for ordered operands, with NE being the canonical choice. */ diff --git a/gcc/testsuite/gcc.c-torture/execute/ieee/pr119002.c b/gcc/testsuite/gcc.c-torture/execute/ieee/pr119002.c new file mode 100644 index 000..af1a705f17
[committed] testsuite: Add tests for already fixed PR [PR119071]
Hi! Uros' r15-7793 fixed this PR as well, I'm just committing tests from the PR so that it can be closed. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk as obvious. 2025-03-04 Jakub Jelinek PR rtl-optimization/119071 * gcc.dg/pr119071.c: New test. * gcc.c-torture/execute/pr119071.c: New test. --- gcc/testsuite/gcc.dg/pr119071.c.jj 2025-03-03 21:39:28.828809833 +0100 +++ gcc/testsuite/gcc.dg/pr119071.c 2025-03-03 21:41:21.847251265 +0100 @@ -0,0 +1,45 @@ +/* PR rtl-optimization/119071 */ +/* { dg-do run } */ +/* { dg-options "-O2 -fgimple" } */ + +int a, b; + +int __GIMPLE (ssa,startwith("expand")) +foo (void) +{ + int _1; + int _2; + int _3; + int _5; + _Bool _7; + int _8; + int _9; + _Bool _14; + int _15; + int _16; + _Bool _17; + int _18; + + __BB(2): + _1 = a; + _17 = _1 != _Literal (int) -2; + _18 = (int) _17; + _2 = _18 + _Literal (int) -1; + _3 = _2 + _18; + _14 = _3 != 1; + _15 = (int) _14; + _16 = -_15; + _7 = _3 == 1; + _9 = (int) _7; + _5 = _9 << _16; + _8 = _5 - _15; + b = _8; + return _8; +} + +int +main () +{ + if (foo () != 1) +__builtin_abort (); +} --- gcc/testsuite/gcc.c-torture/execute/pr119071.c.jj 2025-03-03 21:40:21.931077530 +0100 +++ gcc/testsuite/gcc.c-torture/execute/pr119071.c 2025-03-03 21:40:11.765217724 +0100 @@ -0,0 +1,15 @@ +/* PR rtl-optimization/119071 */ + +int a, b; + +int +main () +{ + int c = 0; + if (a + 2) +c = 1; + int d = (1 + c - 2 + c == 1) - 1; + b = ((d + 1) << d) + d; + if (b != 1) +__builtin_abort (); +} Jakub
Re: [libstdc++] Testsuite should pay attention to extra flags
Hi! On 2025-03-03T17:30:06+0100, I wrote: > On 2002-04-17T21:37:50-0400, Phil Edwards wrote: >> If the user decides to build the library with extra compiler options >> via --enable-cxx-flags, the testsuite should (by default) use those same >> options when running. > > Hmm, are we sure that's what we actually want? > >> Verified by passing strange things via --enable >> and watching their effects on the testsuite. > >> --- testsuite_flags.in 7 Jan 2002 00:07:27 - 1.11 >> +++ testsuite_flags.in 18 Apr 2002 01:34:43 - >> @@ -49,7 +49,7 @@ case ${query} in >>;; >> --cxxflags) >>CXXFLAGS=' -g @SECTION_FLAGS@ @SECTION_LDFLAGS@ >> - -fmessage-length=0 >> + -fmessage-length=0 @EXTRA_CXX_FLAGS@ >> -DDEBUG_ASSERT -DLOCALEDIR="@glibcpp_localedir@" ' >>echo ${CXXFLAGS} >>;; > > This got installed in Subversion r52450 > (Git commit 822ca943a31961fd7339de6fbe6593118ec7f231). > > To me at least, this comes unexpected: per my reading of (the current) > 'libstdc++-v3/acinclude.m4:GLIBCXX_ENABLE_CXX_FLAGS', I'd have thought > these flags apply only to the build of libstdc++ itself, but apparently > "flags to pass to the compiler while building" or > "extra compiler flags for building" also includes use by > 'make check-target-libstdc++-v3'? > > I'm OK to leave this as-is, if only for hysterical raisins, but would > then proposa a patch to document this behavior? > > > Alas, there's an additional issue: with Subversion r42272 > (Git commit 28e8acb68f8a427552bc9f2d4cae12a1ac477855) -- so, already > in place when the change above got installed -- we'd gotten: > > * testsuite/lib/libstdc++-v3-dg.exp (libstdc++-v3-init): Set flags > appropriately for remote testing and testing installed files without > a build dir. > > -[...] > -set cxxflags [exec sh ${blddir}/testsuite_flags --cxxflags] > -[...] > +if [is_remote host] { > + [...] > + set cxxflags "-ggdb3 -DDEBUG_ASSERT" > + [...] > +} else { > +# If we find a testsuite_flags file, we're testing in the build > dir. > +set flags_file "${blddir}/testsuite_flags" > +if { [file exists $flags_file] } { > +[...] > +set cxxflags [exec sh $flags_file --cxxflags] > +[...] > +} else { > +[...] > +set cxxflags "-ggdb3 -DDEBUG_ASSERT" > +[...] > +} > +} > > (Nowadays: 'libstdc++-v3/testsuite/lib/libstdc++.exp:libstdc++_init', > similarly.) > > That means, any 'EXTRA_CXX_FLAGS' specified by '--enable-cxx-flags=[...]' > are *not* considered for remote host testing -- which in turn appears to > be in conflict with the original intent, quoted at the beginning of this > email? Hmm... > > > I think we should clarify the intended cases where 'EXTRA_CXX_FLAGS' (as > specified by '--enable-cxx-flags=[...]') apply, and then make that apply > in a uniform way? I think I'd be in favor of removing them from > 'testsuite_flags --cxxflags' -- but can't quite tell which use cases > that'd break... I still maintain my position that 'testsuite_flags' should not be about passing compiler flags (as opposed to all kinds of search paths, where it's appropriate). However, 'libstdc++-v3/doc/xml/manual/test.xml' also "You can run the tests with different options by adding them to the output of the '--cxxflags' option of that script, or [...]", so there's more precedence of doing it this way. And, 'testsuite_flags -cxxflags' actually isn't only about 'EXTRA_CXX_FLAGS' but also '-fmessage-length=0 -fno-show-column', and '@SECTION_FLAGS@' ('-ffunction-sections -fdata-sections'), and other random ones, like '-fcf-protection', '-mshstk'. All this doesn't work for the '[is_remote host]' case, as far as I can tell, but resolving that shall not be my problem of today. ;-) And, considering that for GCN/nvptx libstdc++ I also need a way to inject compiler flags into 'make check-target-libstdc++-v3', and these happen to overlap with those needed for the libstdc++ build, so the current 'EXTRA_CXX_FLAGS' actually fits that model. ;-) And for flags that only apply to libstdc++ compilation but not apply to 'make check-target-libstdc++-v3', I may use 'OPTIMIZE_CXXFLAGS'. Sounds like a plan, I guess? (So, effectively no code changes, but I'll still propose updates to the documentation.) Grüße Thomas
[PATCH] arm: Handle fixed PIC register in require_pic_register (PR target/115485)
Commit r9-4307-g89d7557202d25a forgot to accept a fixed PIC register when extending the assert in require_pic_register. arm_pic_register can be set explicitly by the user (e.g. -mpic-register=r9) or implicitly as the default value with -fpic/-fPIC/-fPIE and -mno-pic-data-is-text-relative -mlong-calls, and we want to use/accept it when recording cfun->machine->pic_reg as used to be the case. Tested on arm-none-linux-gnueabihf and arm-none-eabi with several sets of options, covering all M-profile architecture versions. PR target/115485 gcc/ * config/arm/arm.cc (require_pic_register): Fix typos in comment. Handle fixed arm_pic_register. gcc/testsuite/ * g++.target/arm/pr115485.C: New test. --- gcc/config/arm/arm.cc | 5 +++-- gcc/testsuite/g++.target/arm/pr115485.C | 16 2 files changed, 19 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/g++.target/arm/pr115485.C diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc index 00499a26bae..670f487bcce 100644 --- a/gcc/config/arm/arm.cc +++ b/gcc/config/arm/arm.cc @@ -8086,8 +8086,8 @@ legitimate_pic_operand_p (rtx x) /* Record that the current function needs a PIC register. If PIC_REG is null, a new pseudo is allocated as PIC register, otherwise PIC_REG is used. In - both case cfun->machine->pic_reg is initialized if we have not already done - so. COMPUTE_NOW decide whether and where to set the PIC register. If true, + both cases cfun->machine->pic_reg is initialized if we have not already done + so. COMPUTE_NOW decides whether and where to set the PIC register. If true, PIC register is reloaded in the current position of the instruction stream irregardless of whether it was loaded before. Otherwise, it is only loaded if not already done so (crtl->uses_pic_offset_table is null). Note that @@ -8107,6 +8107,7 @@ require_pic_register (rtx pic_reg, bool compute_now) if (!crtl->uses_pic_offset_table || compute_now) { gcc_assert (can_create_pseudo_p () + || (arm_pic_register != INVALID_REGNUM) || (pic_reg != NULL_RTX && REG_P (pic_reg) && GET_MODE (pic_reg) == Pmode)); diff --git a/gcc/testsuite/g++.target/arm/pr115485.C b/gcc/testsuite/g++.target/arm/pr115485.C new file mode 100644 index 000..491b48c726a --- /dev/null +++ b/gcc/testsuite/g++.target/arm/pr115485.C @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-fPIE -mno-pic-data-is-text-relative -mlong-calls -ffunction-sections" } */ + +struct c1 { + virtual void func1() = 0; +}; +struct c2 { + virtual ~c2() {} +}; +struct c3 : c2, c1 { + void func1() override; + void func3(); +}; +void c3::func1() { + func3(); +} -- 2.34.1
[committed] libstdc++: Remove stray comma in testing docs
libstdc++-v3/ChangeLog: * doc/xml/manual/test.xml: Remove stray comma. * doc/html/manual/test.html: Regenerate. --- Pushed to trunk. libstdc++-v3/doc/html/manual/test.html | 2 +- libstdc++-v3/doc/xml/manual/test.xml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/libstdc++-v3/doc/html/manual/test.html b/libstdc++-v3/doc/html/manual/test.html index 1c7af1193da..4b295e583f6 100644 --- a/libstdc++-v3/doc/html/manual/test.html +++ b/libstdc++-v3/doc/html/manual/test.html @@ -230,7 +230,7 @@ cat 27_io/objects/char/3_xin.in | a.out The testsuite will create a number of files in the directory in - which you run this command,. Some of those files might use the + which you run this command. Some of those files might use the same name as files created by other testsuites (like the ones for GCC and G++), so you should not try to run all the testsuites in parallel from the same directory. diff --git a/libstdc++-v3/doc/xml/manual/test.xml b/libstdc++-v3/doc/xml/manual/test.xml index 6b7f1b04a2a..963e3e13500 100644 --- a/libstdc++-v3/doc/xml/manual/test.xml +++ b/libstdc++-v3/doc/xml/manual/test.xml @@ -381,7 +381,7 @@ cat 27_io/objects/char/3_xin.in | a.out The testsuite will create a number of files in the directory in - which you run this command,. Some of those files might use the + which you run this command. Some of those files might use the same name as files created by other testsuites (like the ones for GCC and G++), so you should not try to run all the testsuites in parallel from the same directory. -- 2.48.1
Re: libatomic: use HWCAPs in AArch64 ifunc tests
Wilco Dijkstra writes: > Feedback from the kernel team suggests that it's best to only use HWCAPs > rather than also use low-level checks as done by has_lse128() and has_rcpc3(). > So change these to just use HWCAPs which simplifies the code and speeds up > ifunc selection by avoiding expensive system register accesses. Could you give details? I thought it was always known that trapped system register accesses were slow. In the previous versions, the checks seemed to be presented as an up-front price worth paying for faster atomic operations, on the systems that would use those paths. Now the checks are being presented as something that are good to remove to make the code simpler and faster. There have been a few changes to this code in the current release cycle, and each time it seems like the new version is being presented as better than the previous one with single-sentence justifications. Could we instead have a comment in the code discussing the various approaches that we could take, including the ones that previous versions took, describes the trade-offs, and explains why we've chosen to do what we've chosen to do? Thanks, Richard > > Passes regress, OK for commit? > > libatomic: > * config/linux/aarch64/host-config.h (has_lse2): Remove unused arg. > (has_lse128): Change to just use HWCAPs. > (has_rcpc3): Likewise. > > --- > > diff --git a/libatomic/config/linux/aarch64/host-config.h > b/libatomic/config/linux/aarch64/host-config.h > index > d0d44bf18eaa64437f52c2894da6ece9e02618df..6a4f7014323a2ed196cabe408aaa6df0d2521518 > 100644 > --- a/libatomic/config/linux/aarch64/host-config.h > +++ b/libatomic/config/linux/aarch64/host-config.h > @@ -69,7 +69,7 @@ typedef struct __ifunc_arg_t { > # elif defined (LSE2_LRCPC3_ATOP) > # define IFUNC_NCOND(N) 2 > # define IFUNC_COND_1(has_rcpc3 (hwcap, features)) > -# define IFUNC_COND_2(has_lse2 (hwcap, features)) > +# define IFUNC_COND_2(has_lse2 (hwcap)) > # elif defined (LSE128_ATOP) > # define IFUNC_NCOND(N) 1 > # define IFUNC_COND_1(has_lse128 (hwcap, features)) > @@ -86,7 +86,7 @@ typedef struct __ifunc_arg_t { > #define MIDR_PARTNUM(midr) (((midr) >> 4) & 0xfff) > > static inline bool > -has_lse2 (unsigned long hwcap, const __ifunc_arg_t *features) > +has_lse2 (unsigned long hwcap) > { >/* Check for LSE2. */ >if (hwcap & HWCAP_USCAT) > @@ -105,50 +105,20 @@ has_lse2 (unsigned long hwcap, const __ifunc_arg_t > *features) >return false; > } > > -/* LSE128 atomic support encoded in ID_AA64ISAR0_EL1.Atomic, bits[23:20]. > - The minimum value for LSE128 is 0b0011. */ > - > -#define AT_FEAT_FIELD(isar0) (((isar0) >> 20) & 15) > - > static inline bool > has_lse128 (unsigned long hwcap, const __ifunc_arg_t *features) > { > - if (hwcap & _IFUNC_ARG_HWCAP && features->_hwcap2 & HWCAP2_LSE128) > -return true; > - > - /* If LSE2 and CPUID are supported, check for LSE128. */ > - if (hwcap & HWCAP_CPUID && hwcap & HWCAP_USCAT) > -{ > - unsigned long isar0; > - asm volatile ("mrs %0, ID_AA64ISAR0_EL1" : "=r" (isar0)); > - return AT_FEAT_FIELD (isar0) >= 3; > -} > - > - return false; > + return hwcap & _IFUNC_ARG_HWCAP && features->_hwcap2 & HWCAP2_LSE128; > } > > -/* LRCPC atomic support encoded in ID_AA64ISAR1_EL1.Atomic, bits[23:20]. > - The minimum value for LRCPC3 is 0b0011. */ > - > static inline bool > has_rcpc3 (unsigned long hwcap, const __ifunc_arg_t *features) > { >/* LSE2 is a prerequisite for atomic LDIAPP/STILP - check HWCAP_USCAT since > has_lse2 is more expensive and Neoverse N1 does not have LRCPC3. */ > - if (!(hwcap & HWCAP_USCAT)) > -return false; > - > - if (hwcap & _IFUNC_ARG_HWCAP && features->_hwcap2 & HWCAP2_LRCPC3) > -return true; > - > - if (hwcap & HWCAP_CPUID) > -{ > - unsigned long isar1; > - asm volatile ("mrs %0, ID_AA64ISAR1_EL1" : "=r" (isar1)); > - return AT_FEAT_FIELD (isar1) >= 3; > -} > - > - return false; > + return (hwcap & HWCAP_USCAT > + && hwcap & _IFUNC_ARG_HWCAP > + && features->_hwcap2 & HWCAP2_LRCPC3); > } > > #endif /* HAVE_IFUNC */
[PATCH] ranger: Improve nonnull_if_nonzero attribute [PR117023]
Hi! On Fri, Nov 22, 2024 at 07:25:23PM -0500, Andrew MacLeod wrote: > I will shortly be submitting , and presumable committing, this patch as > part of a series to improve VRP time for 117467.. > > So it may be in place by the time you need it > On 11/18/24 09:31, Andrew MacLeod wrote: > > Attached is a pre-approved patch which adds a range_query to the > > inferred range mechanism. > > > > The only change you will need to make is to replace "get_range_query > > (cfun)->" with "q->" which is passed in. > > > > This regstraps on x86 without your patch, and I got as far as a > > bootstrap with your patches.. Sorry for the delay. Only recently the remaining patches have been committed, so I'm getting back to this. The current state of the trunk is that it handles constant arg2 (if it is zero or non-zero) and punts on everything else. I've changed get_range_query (cfun)-> to q-> but unfortunately the test FAILs with it, while it improves the if (n >= 42) case where there is a SSA_NAME for n - 10, it doesn't improve the if (n) case, so it feels like it is querying just the global range rather than the local one from the statement, because on the foo (b, n); statement which is guarded by if (n) n should be provably [1, SIZE_MAX]. The patch still bootstraps/regtests on x86_64-linux and i686-linux and improves at least something, so I guess I could just comment out part of the testcase. Any thoughts why this happens though? And is that something that can be improved for GCC 15 or should wait for GCC 16? 2025-03-04 Jakub Jelinek Andrew MacLeod PR c/117023 * gimple-range-infer.cc (gimple_infer_range::gimple_infer_range): For nonnull_if_nonzero attribute check also arg2 range if it doesn't include zero and in that case call add_nonzero too. * gcc.dg/tree-ssa/pr78154-2.c: New test. --- gcc/gimple-range-infer.cc.jj2025-01-02 11:23:24.0 +0100 +++ gcc/gimple-range-infer.cc 2025-03-03 12:37:52.836895208 +0100 @@ -208,8 +208,13 @@ gimple_infer_range::gimple_infer_range ( continue; if (integer_nonzerop (arg2)) add_nonzero (arg); - // FIXME: Can one query here whether arg2 has - // nonzero range if it is a SSA_NAME? + else + { + value_range r (TREE_TYPE (arg2)); + if (q->range_of_expr (r, arg2, s) + && !r.contains_p (build_zero_cst (TREE_TYPE (arg2 + add_nonzero (arg); + } } } // Fallthru and walk load/store ops now. --- gcc/testsuite/gcc.dg/tree-ssa/pr78154-2.c.jj2025-03-03 12:38:37.841273517 +0100 +++ gcc/testsuite/gcc.dg/tree-ssa/pr78154-2.c 2025-03-03 12:38:37.841273517 +0100 @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-evrp-slim -fdelete-null-pointer-checks" } */ +/* { dg-skip-if "" { keeps_null_pointer_checks } } */ + +void foo (void *, __SIZE_TYPE__) __attribute__((nonnull_if_nonzero (1, 2))); +void baz (void); + +void +bar (void *a, void *b, void *c, void *d, void *e, __SIZE_TYPE__ n) +{ + foo (a, 42); + if (a == 0) +__builtin_abort (); + if (n) +{ + foo (b, n); + if (b == 0) + __builtin_abort (); +} + if (n >= 42) +{ + foo (c, n - 10); + if (c == 0) + __builtin_abort (); +} + foo (d, 0); + if (d == 0) +baz (); + if (n != 42) +{ + foo (e, n); + if (e == 0) + baz (); +} +} + +/* { dg-final { scan-tree-dump-not "__builtin_abort" "evrp" } } */ +/* { dg-final { scan-tree-dump-times "baz \\\(" 2 "evrp" } } */ Jakub
Re: AArch64: Enable early scheduling for -O3 and higher (PR118351)
Kyrylo Tkachov writes: >> On 3 Mar 2025, at 19:58, Wilco Dijkstra wrote: >> >> >> Enable the early scheduler on AArch64 for O3/Ofast. This means GCC15 >> benefits >> from much faster build times with -O2, but avoids the regressions in lbm >> which >> is very sensitive to minor scheduling changes due to long FMA chains. We can >> then revisit this for GCC16. >> > > I’m in favour of this. Yeah, seems ok to me too. I suppose we ought to update the documentation too: @opindex fschedule-insns @item -fschedule-insns If supported for the target machine, attempt to reorder instructions to eliminate execution stalls due to required data being unavailable. This helps machines that have slow floating point or memory load instructions by allowing other instructions to be issued until the result of the load or floating-point instruction is required. Enabled at levels @option{-O2}, @option{-O3}. Richard > So ok for me if no objections within 24h. > Thanks, > Kyrill > >> gcc: >>PR target/118351 >>* common/config/aarch64/aarch64-common.cc: Enable early scheduling >> with >>-O3 and higher. >> >> --- >> >> diff --git a/gcc/common/config/aarch64/aarch64-common.cc >> b/gcc/common/config/aarch64/aarch64-common.cc >> index >> 3d694f16d1fd84e142254a4880c91a7f053e72aa..3044336923415d9414b6c66e66d872612ead24cd >> 100644 >> --- a/gcc/common/config/aarch64/aarch64-common.cc >> +++ b/gcc/common/config/aarch64/aarch64-common.cc >> @@ -54,8 +54,10 @@ static const struct default_options >> aarch_option_optimization_table[] = >> { OPT_LEVELS_FAST, OPT_fomit_frame_pointer, NULL, 1 }, >> /* Enable -fsched-pressure by default when optimizing. */ >> { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 }, >> -/* Disable early scheduling due to high compile-time overheads. */ >> +/* Except for -O3 and higher, disable early scheduling due to high >> + compile-time overheads. */ >> { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 }, >> +{ OPT_LEVELS_3_PLUS, OPT_fschedule_insns, NULL, 1 }, >> /* Enable redundant extension instructions removal at -O2 and higher. */ >> { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 }, >> { OPT_LEVELS_2_PLUS, OPT_mearly_ra_, NULL, AARCH64_EARLY_RA_ALL }, >>
Re: AArch64: Enable early scheduling for -O3 and higher (PR118351)
> On 3 Mar 2025, at 19:58, Wilco Dijkstra wrote: > > > Enable the early scheduler on AArch64 for O3/Ofast. This means GCC15 benefits > from much faster build times with -O2, but avoids the regressions in lbm which > is very sensitive to minor scheduling changes due to long FMA chains. We can > then revisit this for GCC16. > I’m in favour of this. So ok for me if no objections within 24h. Thanks, Kyrill > gcc: >PR target/118351 >* common/config/aarch64/aarch64-common.cc: Enable early scheduling with >-O3 and higher. > > --- > > diff --git a/gcc/common/config/aarch64/aarch64-common.cc > b/gcc/common/config/aarch64/aarch64-common.cc > index > 3d694f16d1fd84e142254a4880c91a7f053e72aa..3044336923415d9414b6c66e66d872612ead24cd > 100644 > --- a/gcc/common/config/aarch64/aarch64-common.cc > +++ b/gcc/common/config/aarch64/aarch64-common.cc > @@ -54,8 +54,10 @@ static const struct default_options > aarch_option_optimization_table[] = > { OPT_LEVELS_FAST, OPT_fomit_frame_pointer, NULL, 1 }, > /* Enable -fsched-pressure by default when optimizing. */ > { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 }, > -/* Disable early scheduling due to high compile-time overheads. */ > +/* Except for -O3 and higher, disable early scheduling due to high > + compile-time overheads. */ > { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 }, > +{ OPT_LEVELS_3_PLUS, OPT_fschedule_insns, NULL, 1 }, > /* Enable redundant extension instructions removal at -O2 and higher. */ > { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 }, > { OPT_LEVELS_2_PLUS, OPT_mearly_ra_, NULL, AARCH64_EARLY_RA_ALL }, >
Re: [RFA] ira: Add new hooks for callee-save vs spills [PR117477]
On Tue, Mar 4, 2025 at 11:18 AM Richard Sandiford wrote: > > Richard Sandiford writes: > > Jan Hubicka writes: > >>> > >>> Thanks for running these. I saw poor results for perlbench with my > >>> initial aarch64 hooks because the hooks reduced the cost to zero for > >>> the entry case: > >>> > >>> auto entry_cost = targetm.callee_save_cost > >>> (spill_cost_type::SAVE, hard_regno, mode, saved_nregs, > >>>ira_memory_move_cost[mode][rclass][0] * saved_nregs / nregs, > >>>allocated_callee_save_regs, existing_spills_p); > >>> /* In the event of a tie between caller-save and callee-save, > >>>prefer callee-save. We apply this to the entry cost rather > >>>than the exit cost since the entry frequency must be at > >>>least as high as the exit frequency. */ > >>> if (entry_cost > 0) > >>> entry_cost -= 1; > >>> > >>> I "fixed" that by bumping the cost to a minimum of 2, but I was > >>> wondering whether the "entry_cost > 0" should instead be "entry_cost > 1", > >>> so that the cost is always greater than not using a callee save for > >>> registers that don't cross a call. WDYT? > >> > >> For x86 perfomance costs, the push cost should be memory_move_cost which > >> is 6, -2 for adjustment in the target hook and -1 for this. So cost > >> should not be 0 I think. > >> > >> For size cost, I currently return 1, so we indeed get 0 after > >> adjustment. > >> > >> I think cost of 0 will make us to pick callee save even if caller save > >> is available and there are no function calls, so I guess we do not want > >> that > > > > OK, here's an updated patch that makes that change. The x86 parts > > should be replaced by your patch. > > > > Tested on aarch64-linux-gnu. I also tried to test on pwoerpc64el-linux-gnu > > (on gcc112), but I keep getting broken pipes during the test runs, > > so I'm struggling to get good before/after comparisons. It does at > > least bootstrap though... > > Here's the patch with Honza's x86 changes. Boostrapped & regresiion-tested > on aarch64-linux-gnu and powerpc64le-linux-gnu (gcc120). The powerpc64le > results regressed: > > FAIL: gcc.dg/guality/vla-1.c -Os -DPREVENT_OPTIMIZATION line 24 i == 5 > FAIL: gcc.dg/guality/vla-1.c -Os -DPREVENT_OPTIMIZATION line 24 sizeof > (a) == 17 * sizeof (short) > > but the same test already failed for -O2 and -O3. > > OK to install now? Or, given the lateness in the release cycle, > would it be better to wait for GCC 16? I think it's OK to install now. Not installing anything isn't an option, the alternative would be to at least revert HJs change. Thanks, Richard. > > Thanks, > Richard > > > Following on from the discussion in: > > https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675256.html > > this patch removes TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE and > replaces it with two hooks: one that controls the cost of using an > extra callee-saved register and one that controls the cost of allocating > a frame for the first spill. > > (The patch does not attempt to address the shrink-wrapping part of > the thread above.) > > On AArch64, this is enough to fix PR117477, as verified by the new tests. > The patch does not change the SPEC2017 scores significantly. (I saw a > slight improvement in fotonik3d and roms, but I'm not convinced that > the improvements are real.) > > The patch makes IRA use caller saves for gcc.target/aarch64/pr103350-1.c, > which is a scan-dump correctness test that relies on not using > caller saves. The decision to use caller saves looks appropriate, > and saves an instruction, so I've just added -fno-caller-saves > to the test options. > > The x86 parts were written by Honza. > > gcc/ > PR rtl-optimization/117477 > * config/aarch64/aarch64.cc (aarch64_count_saves): New function. > (aarch64_count_above_hard_fp_saves, aarch64_callee_save_cost) > (aarch64_frame_allocation_cost): Likewise. > (TARGET_CALLEE_SAVE_COST): Define. > (TARGET_FRAME_ALLOCATION_COST): Likewise. > * config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale): > Replace with... > (ix86_callee_save_cost): ...this new hook. > (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete. > (TARGET_CALLEE_SAVE_COST): Define. > * target.h (spill_cost_type, frame_cost_type): New enums. > * target.def (callee_save_cost, frame_allocation_cost): New hooks. > (ira_callee_saved_register_cost_scale): Delete. > * doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): > Delete. > (TARGET_CALLEE_SAVE_COST, TARGET_FRAME_ALLOCATION_COST): New hooks. > * doc/tm.texi: Regenerate. > * hard-reg-set.h (hard_reg_set_popcount): New function. > * ira-color.cc (allocated_memory_p): New variable. > (allocated_callee_save_regs): Likewise. > (record_allocation): New function. >
[PATCH] go/119098 - bump libgo version for GCC 15 release
As PR119098 shows the gccgo driver uses the install path compiled into the shared libgo and thus updating that but not the driver will cause the driver to fail to find other tools like cgo. Thus bump the SONAME for GCC 15 as well, as we've done for each release in the past. I think libgo changes go through another "upstream", so Ian, can you please take care of this? Thanks, Richard. PR go/119098 libgo/ * configure.ac (libtool_VERSION): Bump. * configure: Regenerate. --- libgo/configure| 2 +- libgo/configure.ac | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/libgo/configure b/libgo/configure index c0d0a1560f5..b1a2228fa1b 100755 --- a/libgo/configure +++ b/libgo/configure @@ -2611,7 +2611,7 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu ac_config_headers="$ac_config_headers config.h" -libtool_VERSION=23:0:0 +libtool_VERSION=24:0:0 # Default to --enable-multilib diff --git a/libgo/configure.ac b/libgo/configure.ac index 898091276f7..0b05551aacb 100644 --- a/libgo/configure.ac +++ b/libgo/configure.ac @@ -10,7 +10,7 @@ AC_INIT(package-unused, version-unused,, libgo) AC_CONFIG_SRCDIR(Makefile.am) AC_CONFIG_HEADER(config.h) -libtool_VERSION=23:0:0 +libtool_VERSION=24:0:0 AC_SUBST(libtool_VERSION) AM_ENABLE_MULTILIB(, ..) -- 2.43.0
[RFA] ira: Add new hooks for callee-save vs spills [PR117477]
Richard Sandiford writes: > Jan Hubicka writes: >>> >>> Thanks for running these. I saw poor results for perlbench with my >>> initial aarch64 hooks because the hooks reduced the cost to zero for >>> the entry case: >>> >>> auto entry_cost = targetm.callee_save_cost >>> (spill_cost_type::SAVE, hard_regno, mode, saved_nregs, >>>ira_memory_move_cost[mode][rclass][0] * saved_nregs / nregs, >>>allocated_callee_save_regs, existing_spills_p); >>> /* In the event of a tie between caller-save and callee-save, >>>prefer callee-save. We apply this to the entry cost rather >>>than the exit cost since the entry frequency must be at >>>least as high as the exit frequency. */ >>> if (entry_cost > 0) >>> entry_cost -= 1; >>> >>> I "fixed" that by bumping the cost to a minimum of 2, but I was >>> wondering whether the "entry_cost > 0" should instead be "entry_cost > 1", >>> so that the cost is always greater than not using a callee save for >>> registers that don't cross a call. WDYT? >> >> For x86 perfomance costs, the push cost should be memory_move_cost which >> is 6, -2 for adjustment in the target hook and -1 for this. So cost >> should not be 0 I think. >> >> For size cost, I currently return 1, so we indeed get 0 after >> adjustment. >> >> I think cost of 0 will make us to pick callee save even if caller save >> is available and there are no function calls, so I guess we do not want >> that > > OK, here's an updated patch that makes that change. The x86 parts > should be replaced by your patch. > > Tested on aarch64-linux-gnu. I also tried to test on pwoerpc64el-linux-gnu > (on gcc112), but I keep getting broken pipes during the test runs, > so I'm struggling to get good before/after comparisons. It does at > least bootstrap though... Here's the patch with Honza's x86 changes. Boostrapped & regresiion-tested on aarch64-linux-gnu and powerpc64le-linux-gnu (gcc120). The powerpc64le results regressed: FAIL: gcc.dg/guality/vla-1.c -Os -DPREVENT_OPTIMIZATION line 24 i == 5 FAIL: gcc.dg/guality/vla-1.c -Os -DPREVENT_OPTIMIZATION line 24 sizeof (a) == 17 * sizeof (short) but the same test already failed for -O2 and -O3. OK to install now? Or, given the lateness in the release cycle, would it be better to wait for GCC 16? Thanks, Richard Following on from the discussion in: https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675256.html this patch removes TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE and replaces it with two hooks: one that controls the cost of using an extra callee-saved register and one that controls the cost of allocating a frame for the first spill. (The patch does not attempt to address the shrink-wrapping part of the thread above.) On AArch64, this is enough to fix PR117477, as verified by the new tests. The patch does not change the SPEC2017 scores significantly. (I saw a slight improvement in fotonik3d and roms, but I'm not convinced that the improvements are real.) The patch makes IRA use caller saves for gcc.target/aarch64/pr103350-1.c, which is a scan-dump correctness test that relies on not using caller saves. The decision to use caller saves looks appropriate, and saves an instruction, so I've just added -fno-caller-saves to the test options. The x86 parts were written by Honza. gcc/ PR rtl-optimization/117477 * config/aarch64/aarch64.cc (aarch64_count_saves): New function. (aarch64_count_above_hard_fp_saves, aarch64_callee_save_cost) (aarch64_frame_allocation_cost): Likewise. (TARGET_CALLEE_SAVE_COST): Define. (TARGET_FRAME_ALLOCATION_COST): Likewise. * config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale): Replace with... (ix86_callee_save_cost): ...this new hook. (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete. (TARGET_CALLEE_SAVE_COST): Define. * target.h (spill_cost_type, frame_cost_type): New enums. * target.def (callee_save_cost, frame_allocation_cost): New hooks. (ira_callee_saved_register_cost_scale): Delete. * doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete. (TARGET_CALLEE_SAVE_COST, TARGET_FRAME_ALLOCATION_COST): New hooks. * doc/tm.texi: Regenerate. * hard-reg-set.h (hard_reg_set_popcount): New function. * ira-color.cc (allocated_memory_p): New variable. (allocated_callee_save_regs): Likewise. (record_allocation): New function. (assign_hard_reg): Use targetm.frame_allocation_cost to model the cost of the first spill or first caller save. Use targetm.callee_save_cost to model the cost of using new callee-saved registers. Apply the exit rather than entry frequency to the cost of restoring a register or deallocating the frame. Update the new variables above. (improve_allocatio
[PATCH] c++: Handle RAW_DATA_CST in modules.cc [PR119076]
Hi! The following testcases (one with #embed, one with large initializer turned into RAW_DATA_CST) show that I forgot to handle RAW_DATA_CST in module streaming. Similar to the PCH case we need to stream out RAW_DATA_CST with NULL RAW_DATA_OWNER (i.e. a tree which has data owned by libcpp buffer) so that it will be streamed back in as STRING_CST which owns the data, but because the data can be really large (hopefully not so much for header modules though), without actually trying to build a STRING_CST on the module writing side because that would mean another large allocation and copying of the large data. RAW_DATA_CST with RAW_DATA_OWNER then needs to be streamed out and in by streaming the owner and offset from owner's data and length. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2025-03-04 Jakub Jelinek PR c++/119076 * module.cc (trees_out::start): Handle RAW_DATA_CST. (trees_in::start): Likewise. (trees_out::core_vals): Likewise. (trees_in::core_vals): Likewise. * g++.dg/modules/pr119076-1_a.H: New test. * g++.dg/modules/pr119076-1_b.C: New test. * g++.dg/modules/pr119076-2_a.H: New test. * g++.dg/modules/pr119076-2_b.C: New test. --- gcc/cp/module.cc.jj 2025-01-28 09:23:38.195599960 +0100 +++ gcc/cp/module.cc2025-03-04 00:52:50.947063367 +0100 @@ -5319,6 +5319,30 @@ trees_out::start (tree t, bool code_stre str (TREE_STRING_POINTER (t), TREE_STRING_LENGTH (t)); break; +case RAW_DATA_CST: + if (RAW_DATA_OWNER (t) == NULL_TREE) + { + /* Stream RAW_DATA_CST with no owner (i.e. data pointing +into libcpp buffers) as something we can stream in as +STRING_CST which owns the data. */ + u (0); + /* Can't use str (RAW_DATA_POINTER (t), RAW_DATA_LENGTH (t)); +here as there isn't a null termination after it. */ + z (RAW_DATA_LENGTH (t)); + if (RAW_DATA_LENGTH (t)) + if (void *ptr = buf (RAW_DATA_LENGTH (t) + 1)) + { + memcpy (ptr, RAW_DATA_POINTER (t), RAW_DATA_LENGTH (t)); + ((char *) ptr)[RAW_DATA_LENGTH (t)] = '\0'; + } + } + else + { + gcc_assert (RAW_DATA_LENGTH (t)); + u (RAW_DATA_LENGTH (t)); + } + break; + case VECTOR_CST: u (VECTOR_CST_LOG2_NPATTERNS (t)); u (VECTOR_CST_NELTS_PER_PATTERN (t)); @@ -5400,6 +5424,24 @@ trees_in::start (unsigned code) } break; +case RAW_DATA_CST: + { + size_t l = u (); + if (l == 0) + { + /* Stream in RAW_DATA_CST with no owner as STRING_CST + which owns the data. */ + const char *chars = str (&l); + t = build_string (l, chars); + } + else + { + t = make_node (RAW_DATA_CST); + RAW_DATA_LENGTH (t) = l; + } + } + break; + case VECTOR_CST: { unsigned log2_npats = u (); @@ -6311,6 +6353,22 @@ trees_out::core_vals (tree t) /* Streamed during start. */ break; +case RAW_DATA_CST: + if (RAW_DATA_OWNER (t) == NULL_TREE) + break; /* Streamed as STRING_CST during start. */ + WT (RAW_DATA_OWNER (t)); + if (streaming_p ()) + { + if (TREE_CODE (RAW_DATA_OWNER (t)) == RAW_DATA_CST) + z (RAW_DATA_POINTER (t) - RAW_DATA_POINTER (RAW_DATA_OWNER (t))); + else if (TREE_CODE (RAW_DATA_OWNER (t)) == STRING_CST) + z (RAW_DATA_POINTER (t) + - TREE_STRING_POINTER (RAW_DATA_OWNER (t))); + else + gcc_unreachable (); + } + break; + case VECTOR_CST: for (unsigned ix = vector_cst_encoded_nelts (t); ix--;) WT (VECTOR_CST_ENCODED_ELT (t, ix)); @@ -6845,6 +6903,13 @@ trees_in::core_vals (tree t) /* Streamed during start. */ break; +case RAW_DATA_CST: + RT (RAW_DATA_OWNER (t)); + gcc_assert (TREE_CODE (RAW_DATA_OWNER (t)) == STRING_CST + && TREE_STRING_LENGTH (RAW_DATA_OWNER (t))); + RAW_DATA_POINTER (t) = TREE_STRING_POINTER (RAW_DATA_OWNER (t)) + z (); + break; + case VECTOR_CST: for (unsigned ix = vector_cst_encoded_nelts (t); ix--;) RT (VECTOR_CST_ENCODED_ELT (t, ix)); --- gcc/testsuite/g++.dg/modules/pr119076-1_a.H.jj 2025-03-03 10:22:17.514294281 +0100 +++ gcc/testsuite/g++.dg/modules/pr119076-1_a.H 2025-03-03 12:04:10.041822216 +0100 @@ -0,0 +1,41 @@ +// { dg-additional-options "-fmodule-header -Wno-pedantic" } +// { dg-module-cmi {} } + +constexpr unsigned char a[] = { +#embed __FILE__ +}; + +constexpr int +foo (const unsigned char *p, int s) +{ + int r = 0; + for (int i = 0; i < s; ++i) +r += p[i]; + return r; +} + +constexpr int b = foo (a, sizeof a); + +inline int +bar () +{ + const unsigned char b[] = { +#embed __FILE__ + }; + int r = 0; + for (int i =
Re: [patch, Fortran] Fix PR 119049 and 119074, external prototypes with different arglists
Hi Thomas, while the patch looks ok to me, why did you not choose to generate a "22.7.2 Variable-Length Parameter Lists" https://www.gnu.org/software/c-intro-and-ref/manual/html_node/Variable-Number-of-Arguments.html when the arguments differ? Then its the callee responsibility to figure stuff out. Just out of curiosity. Is this prohibited by any standard? My question is just out of interest. No need to change the patch. Having a warning is better, then running into "WTF?". Look good to me, ok for mainline. Thanks for the patch, Andre On Mon, 3 Mar 2025 22:50:16 +0100 Thomas Koenig wrote: > Hello world, > > this patch is a bit more complicated than originally envisioned. > > The problem was that we were not handling external dummy arguments > with -fc-prototypes-external. In looking at this, I found that we > were not warning about external procedures with different argument > lists. This can actually be legal (see the two test cases) but > creates a problem for the C prototypes: If we have something like > > subroutine foo(a,n) >external a >if (n == 1) call a(1) >if (n == 2) call a(2,3) > end subroutine foo > > then, pre-C23, we could just have written out the prototype as > > void foo_ (void (*a) (), int *n); > > but this is illegal in C23. What to do? I finally chose to warn > about the argument mismatch, with a new option. Warn only because the > code above is legal, but include in -Wall because such code seems highly > suspect. This option is also implied in -fc-prototypes-external. I also > put a warning in the generated header file in that case, so users > have a chance to see what is going on (especially since gcc now > defaults to C23). > > Regression-tested. > > Comments? Suggestions for better wordings? Is -Wall too strong, > should this be -Wextra (but then nobody would see it, probably...)? > OK for trunk? > > Best regards > > Thomas -- Andre Vehreschild * Email: vehre ad gmx dot de
[PATCH 0/2] __builtin_bswapXX: improve docs and tests
Let's see if I get this correct... Apologies in advance if not. While browsing the intrinsic/built-in functions documentation, I had to think a bit extra to see what the function did as the example was ambiguous. Searching for the place to change it, I also realized that the tests were not ideal as the will not detect the (admittedly unlikely) event that a nibble swap instruction is used. BR Oscar Gustafsson
[PATCH 1/2] __builtin_bswapXX: improve docs and tests
ChangeLog: * doc/extend.texi: Improve example for __builtin_swap16. --- diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index ec9bb59900c..83f6e45170b 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -16338,7 +16338,7 @@ Returns the first argument raised to the power of the second. Unlike the @defbuiltin{uint16_t __builtin_bswap16 (uint16_t @var{x})} Returns @var{x} with the order of the bytes reversed; for example, -@code{0xaabb} becomes @code{0xbbaa}. Byte here always means +@code{0xabcd} becomes @code{0xcdab}. Byte here always means exactly 8 bits. @enddefbuiltin
[PATCH 1/2] __builtin_bswapXX: improve docs and tests
gcc/testsuite/ChangeLog * gcc.dg/builtin-bswap-5.c: improve test vector to avoid nibble swaps passing --- diff --git a/gcc/testsuite/gcc.dg/builtin-bswap-5.c b/gcc/testsuite/gcc.dg/builtin-bswap-5.c index b29931e4e10..2ba6c31194f 100644 --- a/gcc/testsuite/gcc.dg/builtin-bswap-5.c +++ b/gcc/testsuite/gcc.dg/builtin-bswap-5.c @@ -6,13 +6,13 @@ main (void) /* Test constant folding. */ extern void link_error (void); - if (__builtin_bswap16(0xaabb) != 0xbbaa) + if (__builtin_bswap16(0xabcd) != 0xcdab) link_error (); - if (__builtin_bswap32(0xaabbccdd) != 0xddccbbaa) + if (__builtin_bswap32(0x89abcdef) != 0xefcdab89) link_error (); - if (__builtin_bswap64(0x1122334455667788ULL) != 0x8877665544332211ULL) + if (__builtin_bswap64(0x0123456789abcdefULL) != 0xefcdab8967452301ULL) link_error (); return 0;