[PATCH v1] RISC-V: Bugfix vec_extract v mode iterator restriction mismatch
From: Pan Li We have vec_extract pattern which takes ZVFHMIN as the mode iterator of the V mode. Aka VF_ZVFHMIN iterator. But it will expand to pred_extract_first pattern which takes the ZVFH as the mode iterator of the V mode. AKa VF. The mismatch will result in one ICE similar as below: insn 30 29 31 2 (set (reg:HF 156 [ _2 ]) (unspec:HF [ (vec_select:HF (reg:RVVMF2HF 134 [ _1 ]) (parallel [ (const_int 0 [0]) ])) (reg:SI 67 vtype) ] UNSPEC_VPREDICATE)) "compress_run-2.c":22:3 -1 (nil)) during RTL pass: vregs compress_run-2.c:25:1: internal compiler error: in extract_insn, at recog.cc:2812 0xb3bc47 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) ../../../gcc/gcc/rtl-error.cc:108 0xb3bc69 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*) ../../../gcc/gcc/rtl-error.cc:116 0xb3a545 extract_insn(rtx_insn*) ../../../gcc/gcc/recog.cc:2812 0x1010e9e instantiate_virtual_regs_in_insn ../../../gcc/gcc/function.cc:1612 0x1010e9e instantiate_virtual_regs ../../../gcc/gcc/function.cc:1995 0x1010e9e execute ../../../gcc/gcc/function.cc:2042 The below test suites are passed for this patch. 1. The rv64gcv fully regression test. 2. The rv64gcv build with glibc. There may be other similar issue(s) for the mismatch, we will take care of them by test cases one by one. PR target/115456 gcc/ChangeLog: * config/riscv/vector-iterators.md: Leverage V_ZVFH instead of V which contains the VF_ZVFHMIN for alignment. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr115456-2.c: New test. * gcc.target/riscv/rvv/base/pr115456-3.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/vector-iterators.md | 4 ++- .../gcc.target/riscv/rvv/base/pr115456-2.c| 31 +++ .../gcc.target/riscv/rvv/base/pr115456-3.c| 31 +++ 3 files changed, 65 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md index 47392d0da4c..43137a2a379 100644 --- a/gcc/config/riscv/vector-iterators.md +++ b/gcc/config/riscv/vector-iterators.md @@ -1578,9 +1578,11 @@ (define_mode_iterator VLS_ZVFH [VLSI VLSF]) (define_mode_iterator V [VI VF_ZVFHMIN]) +(define_mode_iterator V_ZVFH [VI VF]) + (define_mode_iterator V_VLS [V VLS]) -(define_mode_iterator V_VLS_ZVFH [V VLS_ZVFH]) +(define_mode_iterator V_VLS_ZVFH [V_ZVFH VLS_ZVFH]) (define_mode_iterator V_VLSI [VI VLSI]) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c new file mode 100644 index 000..453e18b1c79 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c @@ -0,0 +1,31 @@ +/* Test there is no ICE when compile. */ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv_zvfhmin -mrvv-vector-bits=zvl -mabi=lp64d -O3 -ftree-vectorize" } */ + +#include +#include + +typedef _Float16 vnx4f __attribute__ ((vector_size (8))); + +vnx4f __attribute__ ((noinline, noclone)) +test_5 (vnx4f x, vnx4f y) +{ + return __builtin_shufflevector (x, y, 1, 3, 6, 7); +} + +int +main (void) +{ + vnx4f test_5_x = {0, 1, 3, 4}; + vnx4f test_5_y = {4, 5, 6, 7}; + vnx4f test_5_except = {1, 4, 6, 7}; + vnx4f test_5_real; + test_5_real = test_5 (test_5_x, test_5_y); + + for (int i = 0; i < 4; i++) +assert (test_5_real[i] == test_5_except[i]); + + return 0; +} + +/* { dg-final { scan-assembler-times {call\s+__extendhfsf2} 8 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c new file mode 100644 index 000..2c54f1d7538 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 -ftree-vectorize" } */ + +#include +#include + +typedef _Float16 vnx4f __attribute__ ((vector_size (8))); + +vnx4f __attribute__ ((noinline, noclone)) +test_5 (vnx4f x, vnx4f y) +{ + return __builtin_shufflevector (x, y, 1, 3, 6, 7); +} + +int +main (void) +{ + vnx4f test_5_x = {0, 1, 3, 4}; + vnx4f test_5_y = {4, 5, 6, 7}; + vnx4f test_5_except = {1, 4, 6, 7}; + vnx4f test_5_real; + test_5_real = test_5 (test_5_x, test_5_y); + + for (int i = 0; i < 4; i++) +assert (test_5_real[i] == test_5_except[i]); + + return 0; +} + +/* { dg-final { scan-assembler-not {call\s+__extendhfsf2} } } */ +/* { dg-final { scan-assembler-times {vfmv\.f\.s\s+fa[0-9]+,\s*v[0-9]+} 4 } } */ -- 2.34.1
[PATCH] Fix fallout of peeling for gap improvements
The following hopefully addresses an observed bootstrap issue on aarch64 where maybe-uninit diagnostics occur. It also fixes bogus napkin math from myself when I was confusing rounded up size of a single access with rounded up size of the group accessed in a single scalar iteration. So the following puts in a correctness check, leaving a set of peeling for gaps as insufficient. This could be rectified by splitting the last load into multiple ones but I'm leaving this for a followup, better quickly fix the reported wrong-code. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. * tree-vect-stmts.cc (get_group_load_store_type): Do not re-use poly-int remain but re-compute with non-poly values. Verify the shortened load is good enough to be covered with a single scalar gap iteration before accepting it. * gcc.dg/vect/pr115385.c: Enable AVX2 if available. --- gcc/testsuite/gcc.dg/vect/pr115385.c | 1 + gcc/tree-vect-stmts.cc | 12 +++- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/pr115385.c b/gcc/testsuite/gcc.dg/vect/pr115385.c index a18cd665d7d..baea0b2473f 100644 --- a/gcc/testsuite/gcc.dg/vect/pr115385.c +++ b/gcc/testsuite/gcc.dg/vect/pr115385.c @@ -1,4 +1,5 @@ /* { dg-require-effective-target mmap } */ +/* { dg-additional-options "-mavx2" { target avx2_runtime } } */ #include #include diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index e32d44050e5..ca6052662a3 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -2148,15 +2148,17 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, { /* But peeling a single scalar iteration is enough if we can use the next power-of-two sized partial -access. */ +access and that is sufficiently small to be covered +by the single scalar iteration. */ unsigned HOST_WIDE_INT cnunits, cvf, cremain, cpart_size; if (!nunits.is_constant (&cnunits) || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&cvf) - || ((cremain = remain.to_constant (), true) + || (((cremain = group_size * cvf - gap % cnunits), true) && ((cpart_size = (1 << ceil_log2 (cremain))) != cnunits) - && vector_vector_composition_type - (vectype, cnunits / cpart_size, - &half_vtype) == NULL_TREE)) + && (cremain + group_size < cpart_size + || vector_vector_composition_type + (vectype, cnunits / cpart_size, + &half_vtype) == NULL_TREE))) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, -- 2.35.3
Re: [PATCH] rs6000, altivec-2-runnable.c should be a runnable test
Hi! on 2024/6/14 05:16, Carl Love wrote: > Segher: > > On 6/13/24 12:51, Segher Boessenkool wrote: > > > >> >>> --- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c >>> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c >>> @@ -1,4 +1,4 @@ >>> -/* { dg-do compile { target powerpc*-*-* } } */ >>> +/* { dg-do run { target powerpc*-*-* } } */ >>> /* { dg-options "-mvsx" } */ >>> /* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! >>> has_arch_pwr8 } } } */ >>> /* { dg-require-effective-target powerpc_vsx } */ >> >> Everything in gcc.target/powerpc/ is tested for "target powerpc*-*-*" >> already, so you could remove that target clause even (after testing of >> course :-) ) >> >> Okay for trunk with or without that extra tweak. Thank you! > > I updated the patch by removing the target clause as suggested: > > -/* { dg-do compile { target powerpc*-*-* } } */ > +/* { dg-do run } */ > /* { dg-options "-mvsx" } */ > /* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 > } } } */ > /* { dg-require-effective-target powerpc_vsx } */ Since you changed this for "run", I think you also want s/powerpc_vsx/vsx_hw/ . BR, Kewen > > Retested on Power 10. Reports 2 passes and no failures. I will go ahead and > commit. > > Thanks. > >Carl
Re: [Patch, Fortran, 96418] Fix Test coarray_alloc_comp_4.f08 ICEs
Hi all, I messed up renaming of the coarray_alloc_comp-test. This is fixed in the second version of the patch. Sorry for the inconvenience. Additionally I figured that this patch also fixed PR fortran/103112. Regtests ok on x86_64 Fedora 39. Ok for mainline? Regards, Andre On Tue, 11 Jun 2024 16:12:38 +0200 Andre Vehreschild wrote: > Hi all, > > attached patch has already been present in 2020, but lost my attention. It > fixes an ICE in the testsuite. The old mails description is: > > attached patch fixes PR96418 where the code in the testsuite when compiled > with -fcoarray=single lead to an ICE. The reason was that the coarray object > was derefed as an array, but it was no array. Introducing the test for the > descriptor removes the ICE. > > Regtests ok on x86_64-linux/Fedora 39. Ok for mainline? > > Regards, > Andre > -- > Andre Vehreschild * Email: vehre ad gmx dot de -- Andre Vehreschild * Email: vehre ad gmx dot de From 50ed0af756a3eb6b9c95a39c379e1b3f2daf331d Mon Sep 17 00:00:00 2001 From: Andre Vehreschild Date: Tue, 11 Jun 2024 15:24:55 +0200 Subject: [PATCH] Fix ICE when compiling with -fcoarray=single, when derefing a non-array. PR fortran/96418 PR fortran/103112 gcc/fortran/ChangeLog: * trans.c (gfc_deallocate_with_status): Check that object to deref is an array, before applying array deref. gcc/testsuite/ChangeLog: * gfortran.dg/coarray_alloc_comp_3.f08: Moved to... * gfortran.dg/coarray/alloc_comp_8.f90: ...here. Should be tested for both -fcoarray=single and lib, resp. * gfortran.dg/coarray_alloc_comp_4.f08: Fix program name. --- gcc/fortran/trans.cc | 3 ++- .../{coarray_alloc_comp_3.f08 => coarray/alloc_comp_8.f90} | 3 +-- gcc/testsuite/gfortran.dg/coarray_alloc_comp_4.f08 | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) rename gcc/testsuite/gfortran.dg/{coarray_alloc_comp_3.f08 => coarray/alloc_comp_8.f90} (95%) diff --git a/gcc/fortran/trans.cc b/gcc/fortran/trans.cc index a208afe90ab..1335b8cc48b 100644 --- a/gcc/fortran/trans.cc +++ b/gcc/fortran/trans.cc @@ -1838,7 +1838,8 @@ gfc_deallocate_with_status (tree pointer, tree status, tree errmsg, else caf_dereg_type = (enum gfc_coarray_deregtype) coarray_dealloc_mode; } - else if (flag_coarray == GFC_FCOARRAY_SINGLE) + else if (flag_coarray == GFC_FCOARRAY_SINGLE + && GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (pointer))) pointer = gfc_conv_descriptor_data_get (pointer); } else if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (pointer))) diff --git a/gcc/testsuite/gfortran.dg/coarray_alloc_comp_3.f08 b/gcc/testsuite/gfortran.dg/coarray/alloc_comp_8.f90 similarity index 95% rename from gcc/testsuite/gfortran.dg/coarray_alloc_comp_3.f08 rename to gcc/testsuite/gfortran.dg/coarray/alloc_comp_8.f90 index e2037aa5809..8b153925129 100644 --- a/gcc/testsuite/gfortran.dg/coarray_alloc_comp_3.f08 +++ b/gcc/testsuite/gfortran.dg/coarray/alloc_comp_8.f90 @@ -1,12 +1,11 @@ ! { dg-do run } -! { dg-options "-fcoarray=lib -lcaf_single" } ! { dg-additional-options "-latomic" { target libatomic_available } } ! ! Contributed by Andre Vehreschild ! Check that manually freeing components does not lead to a runtime crash, ! when the auto-deallocation is taking care. -program coarray_alloc_comp_3 +program alloc_comp_6 implicit none type dt diff --git a/gcc/testsuite/gfortran.dg/coarray_alloc_comp_4.f08 b/gcc/testsuite/gfortran.dg/coarray_alloc_comp_4.f08 index 6586ec651dd..4c71a90af8f 100644 --- a/gcc/testsuite/gfortran.dg/coarray_alloc_comp_4.f08 +++ b/gcc/testsuite/gfortran.dg/coarray_alloc_comp_4.f08 @@ -5,7 +5,7 @@ ! Contributed by Andre Vehreschild ! Check that sub-components are caf_deregistered and not freed. -program coarray_alloc_comp_3 +program coarray_alloc_comp_4 implicit none type dt -- 2.45.1
Re: [PATCH v1] RISC-V: Bugfix vec_extract v mode iterator restriction mismatch
LGTM, thanks :) On Fri, Jun 14, 2024 at 3:02 PM wrote: > > From: Pan Li > > We have vec_extract pattern which takes ZVFHMIN as the mode > iterator of the V mode. Aka VF_ZVFHMIN iterator. But it will > expand to pred_extract_first pattern which takes the ZVFH as the mode > iterator of the V mode. AKa VF. The mismatch will result in one ICE > similar as below: > > insn 30 29 31 2 (set (reg:HF 156 [ _2 ]) > (unspec:HF [ > (vec_select:HF (reg:RVVMF2HF 134 [ _1 ]) > (parallel [ > (const_int 0 [0]) > ])) > (reg:SI 67 vtype) > ] UNSPEC_VPREDICATE)) "compress_run-2.c":22:3 -1 > (nil)) > during RTL pass: vregs > compress_run-2.c:25:1: internal compiler error: in extract_insn, at > recog.cc:2812 > 0xb3bc47 _fatal_insn(char const*, rtx_def const*, char const*, int, char > const*) > ../../../gcc/gcc/rtl-error.cc:108 > 0xb3bc69 _fatal_insn_not_found(rtx_def const*, char const*, int, char > const*) > ../../../gcc/gcc/rtl-error.cc:116 > 0xb3a545 extract_insn(rtx_insn*) > ../../../gcc/gcc/recog.cc:2812 > 0x1010e9e instantiate_virtual_regs_in_insn > ../../../gcc/gcc/function.cc:1612 > 0x1010e9e instantiate_virtual_regs > ../../../gcc/gcc/function.cc:1995 > 0x1010e9e execute > ../../../gcc/gcc/function.cc:2042 > > The below test suites are passed for this patch. > 1. The rv64gcv fully regression test. > 2. The rv64gcv build with glibc. > > There may be other similar issue(s) for the mismatch, we will take care > of them by test cases one by one. > > PR target/115456 > > gcc/ChangeLog: > > * config/riscv/vector-iterators.md: Leverage V_ZVFH instead of V > which contains the VF_ZVFHMIN for alignment. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/base/pr115456-2.c: New test. > * gcc.target/riscv/rvv/base/pr115456-3.c: New test. > > Signed-off-by: Pan Li > --- > gcc/config/riscv/vector-iterators.md | 4 ++- > .../gcc.target/riscv/rvv/base/pr115456-2.c| 31 +++ > .../gcc.target/riscv/rvv/base/pr115456-3.c| 31 +++ > 3 files changed, 65 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c > > diff --git a/gcc/config/riscv/vector-iterators.md > b/gcc/config/riscv/vector-iterators.md > index 47392d0da4c..43137a2a379 100644 > --- a/gcc/config/riscv/vector-iterators.md > +++ b/gcc/config/riscv/vector-iterators.md > @@ -1578,9 +1578,11 @@ (define_mode_iterator VLS_ZVFH [VLSI VLSF]) > > (define_mode_iterator V [VI VF_ZVFHMIN]) > > +(define_mode_iterator V_ZVFH [VI VF]) > + > (define_mode_iterator V_VLS [V VLS]) > > -(define_mode_iterator V_VLS_ZVFH [V VLS_ZVFH]) > +(define_mode_iterator V_VLS_ZVFH [V_ZVFH VLS_ZVFH]) > > (define_mode_iterator V_VLSI [VI VLSI]) > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c > b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c > new file mode 100644 > index 000..453e18b1c79 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c > @@ -0,0 +1,31 @@ > +/* Test there is no ICE when compile. */ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv64gcv_zvfhmin -mrvv-vector-bits=zvl -mabi=lp64d > -O3 -ftree-vectorize" } */ > + > +#include > +#include > + > +typedef _Float16 vnx4f __attribute__ ((vector_size (8))); > + > +vnx4f __attribute__ ((noinline, noclone)) > +test_5 (vnx4f x, vnx4f y) > +{ > + return __builtin_shufflevector (x, y, 1, 3, 6, 7); > +} > + > +int > +main (void) > +{ > + vnx4f test_5_x = {0, 1, 3, 4}; > + vnx4f test_5_y = {4, 5, 6, 7}; > + vnx4f test_5_except = {1, 4, 6, 7}; > + vnx4f test_5_real; > + test_5_real = test_5 (test_5_x, test_5_y); > + > + for (int i = 0; i < 4; i++) > +assert (test_5_real[i] == test_5_except[i]); > + > + return 0; > +} > + > +/* { dg-final { scan-assembler-times {call\s+__extendhfsf2} 8 } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c > b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c > new file mode 100644 > index 000..2c54f1d7538 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c > @@ -0,0 +1,31 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 -ftree-vectorize" } */ > + > +#include > +#include > + > +typedef _Float16 vnx4f __attribute__ ((vector_size (8))); > + > +vnx4f __attribute__ ((noinline, noclone)) > +test_5 (vnx4f x, vnx4f y) > +{ > + return __builtin_shufflevector (x, y, 1, 3, 6, 7); > +} > + > +int > +main (void) > +{ > + vnx4f test_5_x = {0, 1, 3, 4}; > + vnx4f test_5_y = {4, 5, 6, 7}; > + vnx4f test_5_except = {1, 4, 6, 7}; > + vnx4f test_5_real; > + test_5_real = test_5 (test_5_x, test_5_y); > + > + for (int i = 0; i < 4; i++) > +assert (test_5_real[
[pushed] wwwdocs: gcc-11: Fix grammar - template alias parameters
Elsewhere, for example in the D documentation, template is used in singular, too. Pushed. Gerald --- htdocs/gcc-11/changes.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html index b4ecf3c0..3737af5b 100644 --- a/htdocs/gcc-11/changes.html +++ b/htdocs/gcc-11/changes.html @@ -585,7 +585,7 @@ You may also want to check out our User-defined attributes can now be used to annotate enum members, alias declarations, and function parameters. - Templates alias parameters can now be instantiated with basic types + Template alias parameters can now be instantiated with basic types such as int or void function(). The mixin construct can now be used as types in the form -- 2.45.2
[COMMITTED 01/16] ada: Remove unused name of aspect from Snames
From: Eric Botcazou gcc/ada/ * snames.ads-tmpl (Name_Storage_Model): Delete. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/snames.ads-tmpl | 1 - 1 file changed, 1 deletion(-) diff --git a/gcc/ada/snames.ads-tmpl b/gcc/ada/snames.ads-tmpl index 6cc66566907..699b8df5851 100644 --- a/gcc/ada/snames.ads-tmpl +++ b/gcc/ada/snames.ads-tmpl @@ -165,7 +165,6 @@ package Snames is Name_Relaxed_Initialization : constant Name_Id := N + $; Name_Stable_Properties : constant Name_Id := N + $; Name_Static_Predicate : constant Name_Id := N + $; - Name_Storage_Model : constant Name_Id := N + $; Name_Storage_Model_Type : constant Name_Id := N + $; Name_String_Literal : constant Name_Id := N + $; Name_Synchronization: constant Name_Id := N + $; -- 2.45.1
[COMMITTED 04/16] ada: Missing initialization of multidimensional array using sliding
From: Javier Miranda When a multidimensional array is initialized with an array aggregate, and inner dimensions of the array are initialized with array subaggregates using sliding, the code generated by the compiler does not initialize the inner dimensions of the array. gcc/ada/ * exp_aggr.adb (Must_Slide): Add missing support for multidimensional arrays. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/exp_aggr.adb | 54 +++- 1 file changed, 33 insertions(+), 21 deletions(-) diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb index 796b0f1e0de..2686f5b3b82 100644 --- a/gcc/ada/exp_aggr.adb +++ b/gcc/ada/exp_aggr.adb @@ -154,8 +154,8 @@ package body Exp_Aggr is -- case the aggregate must slide, and we must introduce an intermediate -- temporary to hold it. -- - -- The same holds in an assignment to one-dimensional array of arrays, - -- when a component may be given with bounds that differ from those of the + -- The same holds in an assignment to multi-dimensional arrays, when + -- components may be given with bounds that differ from those of the -- component type. function Number_Of_Choices (N : Node_Id) return Nat; @@ -9550,32 +9550,44 @@ package body Exp_Aggr is elsif Is_Others_Aggregate (Aggr) then return False; - else - -- Sliding can only occur along the first dimension - -- If any the bounds of non-static sliding is required - -- to force potential range checks. + -- Check if sliding is required + else declare -Bounds1 : constant Range_Nodes := - Get_Index_Bounds (First_Index (Typ)); -Bounds2 : constant Range_Nodes := - Get_Index_Bounds (First_Index (Obj_Type)); +Obj_Index : Node_Id := First_Index (Obj_Type); +Obj_Bounds : Range_Nodes; +Typ_Index : Node_Id := First_Index (Typ); +Typ_Bounds : Range_Nodes; begin -if not Is_OK_Static_Expression (Bounds1.First) or else - not Is_OK_Static_Expression (Bounds2.First) or else - not Is_OK_Static_Expression (Bounds1.Last) or else - not Is_OK_Static_Expression (Bounds2.Last) -then - return True; +while Present (Typ_Index) loop + pragma Assert (Present (Obj_Index)); -else - return Expr_Value (Bounds1.First) /= Expr_Value (Bounds2.First) -or else - Expr_Value (Bounds1.Last) /= Expr_Value (Bounds2.Last); -end if; + Typ_Bounds := Get_Index_Bounds (Typ_Index); + Obj_Bounds := Get_Index_Bounds (Obj_Index); + + if not Is_OK_Static_Expression (Typ_Bounds.First) or else + not Is_OK_Static_Expression (Obj_Bounds.First) or else + not Is_OK_Static_Expression (Typ_Bounds.Last) or else + not Is_OK_Static_Expression (Obj_Bounds.Last) + then + return True; + + elsif Expr_Value (Typ_Bounds.First) + /= Expr_Value (Obj_Bounds.First) + or else Expr_Value (Typ_Bounds.Last) + /= Expr_Value (Obj_Bounds.Last) + then + return True; + end if; + + Next_Index (Typ_Index); + Next_Index (Obj_Index); +end loop; end; end if; + + return False; end Must_Slide; - -- 2.45.1
[COMMITTED 03/16] ada: Couple of small cleanups in semantic analysis of aspects
From: Eric Botcazou The first cleanup is to expose a consistent interface from Sem_Ch13 for the analysis of aspects at various points of the program. The second cleanup is to fix the awkward implementation of the analysis of the specification for the aspects Stable_Properties, Designated_Storage_Model, Storage_Model_Type and Aggregate, which are always delayed, and the incorrect placement of that of the aspect Local_Restrictions, which is never delayed. gcc/ada/ * freeze.adb (Freeze_All): Call Check_Aspects_At_End_Of_Declarations to perform the visibility check for aspects. * sem_ch13.ads (Check_Aspects_At_End_Of_Declarations): Declare. (Check_Aspect_At_Freeze_Point): Move to... (Check_Aspect_At_End_Of_Declarations): Move to... * sem_ch13.adb (Check_Aspect_At_Freeze_Point): ...here. (Check_Aspect_At_End_Of_Declarations): ...here. (Analyze_Aspect_Specifications): Remove peculiar processing for Stable_Properties, Designated_Storage_Model, Storage_Model_Type and Aggregate. Move that of Local_Restrictions around. Reset Aitem at the beginning of the loop for each aspect. (Check_Aspects_At_End_Of_Declarations): New procedure. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/freeze.adb | 17 + gcc/ada/sem_ch13.adb | 87 ++-- gcc/ada/sem_ch13.ads | 14 +++ 3 files changed, 58 insertions(+), 60 deletions(-) diff --git a/gcc/ada/freeze.adb b/gcc/ada/freeze.adb index c4c524f4685..523b026cc21 100644 --- a/gcc/ada/freeze.adb +++ b/gcc/ada/freeze.adb @@ -2645,22 +2645,7 @@ package body Freeze is -- for a description of how we handle aspect visibility). elsif Has_Delayed_Aspects (E) then - declare - Ritem : Node_Id; - - begin - Ritem := First_Rep_Item (E); - while Present (Ritem) loop - if Nkind (Ritem) = N_Aspect_Specification - and then Entity (Ritem) = E - and then Is_Delayed_Aspect (Ritem) - then -Check_Aspect_At_End_Of_Declarations (Ritem); - end if; - - Next_Rep_Item (Ritem); - end loop; - end; + Check_Aspects_At_End_Of_Declarations (E); end if; -- If an incomplete type is still not frozen, this may be a diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb index d065dd8dfda..46a359fd7d6 100644 --- a/gcc/ada/sem_ch13.adb +++ b/gcc/ada/sem_ch13.adb @@ -150,6 +150,15 @@ package body Sem_Ch13 is -- is inserted before the freeze node, and the body of the function is -- inserted after the freeze node. + procedure Check_Aspect_At_End_Of_Declarations (ASN : Node_Id); + -- Performs the processing of an aspect at the freeze all point and issues + -- appropriate error messages if the visibility has indeed changed. ASN is + -- the N_Aspect_Specification node for the aspect. + + procedure Check_Aspect_At_Freeze_Point (ASN : Node_Id); + -- Performs the processing of an aspect at the freeze point. ASN is the + -- N_Aspect_Specification node for the aspect. + procedure Check_Pool_Size_Clash (Ent : Entity_Id; SP, SS : Node_Id); -- Called if both Storage_Pool and Storage_Size attribute definition -- clauses (SP and SS) are present for entity Ent. Issue error message. @@ -1669,7 +1678,6 @@ package body Sem_Ch13 is -- Local variables Aspect : Node_Id; - Aitem : Node_Id := Empty; Ent: Node_Id; L : constant List_Id := Aspect_Specifications (N); @@ -1722,7 +1730,12 @@ package body Sem_Ch13 is Loc : constant Source_Ptr := Sloc (Aspect); Nam : constant Name_Id:= Chars (Id); A_Id : constant Aspect_Id := Get_Aspect_Id (Nam); + +Aitem : Node_Id := Empty; +-- The associated N_Pragma or N_Attribute_Definition_Clause + Anod : Node_Id; +-- An auxiliary node Delay_Required : Boolean; -- Set False if delay is not required @@ -2949,19 +2962,6 @@ package body Sem_Ch13 is end if; end case; -if Delay_Required - and then (A_Id = Aspect_Stable_Properties - or else A_Id = Aspect_Designated_Storage_Model - or else A_Id = Aspect_Storage_Model_Type - or else A_Id = Aspect_Aggregate) - -- ??? It seems like we should do this for all aspects, not - -- just these, but that causes as-yet-undiagnosed regressions. - -then - Set_Has_Delayed_Aspects (E); - Set_Is_Delayed_Aspect (Aspect); -end if; - -- Check 13.1(9.2/5): A represe
[COMMITTED 02/16] ada: Allow implicit dereferenced for uses of 'Super
From: Justin Squirek This patch modifies the experimental 'Super attribute to allow an access-valued prefix to be equivalent to Prefix.all'Super. gcc/ada/ * sem_attr.adb: (Analyze_Attribute): Add check for dereference. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/sem_attr.adb | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb index 22fbca45ac5..2563a92f2f0 100644 --- a/gcc/ada/sem_attr.adb +++ b/gcc/ada/sem_attr.adb @@ -6688,6 +6688,7 @@ package body Sem_Attr is Error_Msg_GNAT_Extension ("attribute %", Sloc (N)); Check_E0; + Check_Dereference; -- Verify that we are looking at a type with ancestors -- 2.45.1
[COMMITTED 06/16] ada: Crash checking accessibility level on private type
From: Justin Squirek This patch fixes an issue in the compiler whereby calculating a static accessibility level on a private type with an access discriminant resulted in a compile time crash when No_Dynamic_Accessibility_Checks is enabled. gcc/ada/ * accessibility.adb: (Accessibility_Level): Replace call Get_Full_View with call to Full_View since Get_Full_View only works with incomplete types. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/accessibility.adb | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/gcc/ada/accessibility.adb b/gcc/ada/accessibility.adb index 47b3a7af10a..da4d1d9ce2e 100644 --- a/gcc/ada/accessibility.adb +++ b/gcc/ada/accessibility.adb @@ -2227,7 +2227,11 @@ package body Accessibility is -- that of the type. elsif Ekind (Def_Ent) = E_Discriminant then - return Scope_Depth (Get_Full_View (Scope (Def_Ent))); + return Scope_Depth + (if Present (Full_View (Scope (Def_Ent))) then + Full_View (Scope (Def_Ent)) +else + Scope (Def_Ent)); end if; end if; -- 2.45.1
[COMMITTED 09/16] ada: Simplify handling of VxWorks-specific error codes for ENOENT
From: Jerome Guitton These error codes were defined on older versions of VxWorks (5, 6, 7 SR0540) and now they are either not defined or they fallback to ENOENT. To handle these cases without using complex tests against vxworks versions, leverage on __has_include and provide a fallback to ENOENT if these error codes are not defined. gcc/ada/ * sysdep.c (S_dosFsLib_FILE_NOT_FOUND, S_nfsLib_NFSERR_NOENT): New macros, falback to ENOENT when not already defined. (__gnat_is_file_not_found_error): Use these new macros to remove tests against VxWorks flavors. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/sysdep.c | 27 --- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/gcc/ada/sysdep.c b/gcc/ada/sysdep.c index 443b11f4302..254c736bec4 100644 --- a/gcc/ada/sysdep.c +++ b/gcc/ada/sysdep.c @@ -35,18 +35,35 @@ #ifdef __vxworks #include "vxWorks.h" #include "ioLib.h" -#if ! defined (VTHREADS) +/* VxWorks 5, 6 and 7 SR0540 expose error codes that need to be handled + as ENOENT. On later versions: + - either they are defined as ENOENT (vx7r2); + - or the corresponding system includes are not provided (Helix Cert). */ + +#if __has_include ("dosFsLib.h") +/* On helix-cert, this include is only provided for RTPs. */ #include "dosFsLib.h" #endif -#if ! defined (__RTP__) && (! defined (VTHREADS) || defined (__VXWORKSMILS__)) + +#ifndef S_dosFsLib_FILE_NOT_FOUND +#define S_dosFsLib_FILE_NOT_FOUND ENOENT +#endif + +#if __has_include ("nfsLib.h") +/* This include is not provided for RTPs or on helix-cert. */ # include "nfsLib.h" #endif + +#ifndef S_nfsLib_NFSERR_NOENT +#define S_nfsLib_NFSERR_NOENT ENOENT +#endif + #include "selectLib.h" #include "version.h" #if defined (__RTP__) # include "vwModNum.h" #endif /* __RTP__ */ -#endif +#endif /* __vxworks */ #ifdef __ANDROID__ #undef __linux__ @@ -912,14 +929,10 @@ __gnat_is_file_not_found_error (int errno_val) /* In the case of VxWorks, we also have to take into account various * filesystem-specific variants of this error. */ -#if ! defined (VTHREADS) && (_WRS_VXWORKS_MAJOR < 7) else if (errno_val == S_dosFsLib_FILE_NOT_FOUND) return 1; -#endif -#if ! defined (__RTP__) && (! defined (VTHREADS) || defined (__VXWORKSMILS__)) else if (errno_val == S_nfsLib_NFSERR_NOENT) return 1; -#endif #if defined (__RTP__) /* An RTP can return an NFS file not found, and the NFS bits must first be masked on to check the errno. */ -- 2.45.1
[COMMITTED 05/16] ada: Minor tweaks to processing of Aggregate aspect
From: Eric Botcazou The main one is to give the error for Aggregate applied to array types from Analyze_Aspects_At_Freeze_Point instead of Check_Aspect_At_Freeze_Point, as for the other aspects. The message is also changed to be more direct. gcc/ada/ * aspects.ads (Operational_Aspect): Alphabetize. * sem_ch13.ads (Analyze_Aspects_At_Freeze_Point): Fix description. * sem_ch13.adb (Analyze_Aspects_At_Freeze_Point) : Give the error for array types here instead of... (Analyze_Aspect_Specifications) : Adjust comment. (Check_Aspect_At_Freeze_Point) : ...here. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/aspects.ads | 4 ++-- gcc/ada/sem_ch13.adb | 17 - gcc/ada/sem_ch13.ads | 9 + 3 files changed, 15 insertions(+), 15 deletions(-) diff --git a/gcc/ada/aspects.ads b/gcc/ada/aspects.ads index 3cc62de3411..1acbec87824 100644 --- a/gcc/ada/aspects.ads +++ b/gcc/ada/aspects.ads @@ -325,12 +325,12 @@ package Aspects is -- List is currently incomplete ??? Operational_Aspect : constant array (Aspect_Id) of Boolean := - (Aspect_Constant_Indexing => True, + (Aspect_Aggregate => True, + Aspect_Constant_Indexing => True, Aspect_Default_Iterator => True, Aspect_Iterator_Element => True, Aspect_Iterable => True, Aspect_Variable_Indexing => True, - Aspect_Aggregate => True, others=> False); -- The following array indicates aspects for which multiple occurrences of diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb index 46a359fd7d6..caebe2e793e 100644 --- a/gcc/ada/sem_ch13.adb +++ b/gcc/ada/sem_ch13.adb @@ -1367,7 +1367,11 @@ package body Sem_Ch13 is Validate_Storage_Model_Type_Aspect (E, ASN); when Aspect_Aggregate => - null; + if Is_Array_Type (E) then +Error_Msg_N + ("aspect Aggregate may not be applied to array type", + ASN); + end if; when others => null; @@ -1384,7 +1388,7 @@ package body Sem_Ch13 is Next_Rep_Item (ASN); end loop; - -- Make a second pass for a Full_Access_Only entry + -- Make a second pass for a Full_Access_Only entry, see above why ASN := First_Rep_Item (E); while Present (ASN) loop @@ -4130,8 +4134,8 @@ package body Sem_Ch13 is end if; when Aspect_Aggregate => - -- We will be checking that the aspect is not specified on a - -- non-array type in Check_Aspect_At_Freeze_Point + -- We will be checking that the aspect is not specified on + -- an array type in Analyze_Aspects_At_Freeze_Point. Validate_Aspect_Aggregate (Expr); @@ -11378,11 +11382,6 @@ package body Sem_Ch13 is return; when Aspect_Aggregate => -if Is_Array_Type (Entity (ASN)) then - Error_Msg_N - ("aspect& can only be applied to non-array type", - Ident); -end if; Resolve_Aspect_Aggregate (Entity (ASN), Expression (ASN)); return; diff --git a/gcc/ada/sem_ch13.ads b/gcc/ada/sem_ch13.ads index 2bdca957826..aeacda833d1 100644 --- a/gcc/ada/sem_ch13.ads +++ b/gcc/ada/sem_ch13.ads @@ -312,10 +312,11 @@ package Sem_Ch13 is -- Quite an awkward approach, but this is an awkard requirement procedure Analyze_Aspects_At_Freeze_Point (E : Entity_Id); - -- Analyzes all the delayed aspects for entity E at freezing point. This - -- includes dealing with inheriting delayed aspects from the parent type - -- in the case where a derived type is frozen. Callers should check that - -- Has_Delayed_Aspects (E) is True before calling this routine. + -- Analyzes all the delayed aspects for entity E at the freeze point. Note + -- that this does not include dealing with inheriting delayed aspects from + -- the parent or base type in the case where a derived type or a subtype is + -- frozen. Callers should check that Has_Delayed_Aspects (E) is True before + -- calling this routine. procedure Check_Aspects_At_End_Of_Declarations (E : Entity_Id); -- Performs the processing described above at the freeze all point, and -- 2.45.1
[COMMITTED 08/16] ada: Minor tweak in Snames
From: Eric Botcazou gcc/ada/ * snames.ads-tmpl (Name_Present): Move to Repinfo section. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/snames.ads-tmpl | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/gcc/ada/snames.ads-tmpl b/gcc/ada/snames.ads-tmpl index 699b8df5851..d2f724f86ca 100644 --- a/gcc/ada/snames.ads-tmpl +++ b/gcc/ada/snames.ads-tmpl @@ -903,10 +903,6 @@ package Snames is Name_Warn : constant Name_Id := N + $; Name_Working_Storage: constant Name_Id := N + $; - -- used by Repinfo JSON I/O - - Name_Present: constant Name_Id := N + $; - -- Names of recognized attributes. The entries with the comment "Ada 83" -- are attributes that are defined in Ada 83, but not in Ada 95. These -- attributes are implemented in all Ada modes in GNAT. @@ -1372,6 +1368,7 @@ package Snames is Name_Discriminant : constant Name_Id := N + $; Name_Operands : constant Name_Id := N + $; + Name_Present : constant Name_Id := N + $; -- Other miscellaneous names used in front end -- Note that the UP_ prefix means use the rest of the name in uppercase, -- 2.45.1
[COMMITTED 13/16] ada: Do not create null GCC thunks
From: Eric Botcazou This prevents Gigi from creating null GCC thunks, i.e. thunks that have all their internal parameters set to zero, replacing them with aliases. They can arise in degenerate cases and null thunks would trip on an assertion in former_thunk_p when they are later optimized. gcc/ada/ PR ada/109817 * gcc-interface/trans.cc (maybe_make_gnu_thunk): Create an alias instead of a null thunk. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/gcc-interface/trans.cc | 29 +++-- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc index 93978c0f0ba..5256095dfeb 100644 --- a/gcc/ada/gcc-interface/trans.cc +++ b/gcc/ada/gcc-interface/trans.cc @@ -11093,6 +11093,16 @@ maybe_make_gnu_thunk (Entity_Id gnat_thunk, tree gnu_thunk) tree gnu_interface_offset = gnu_interface_tag ? byte_position (gnu_interface_tag) : NULL_TREE; + /* But we generate a call to the Thunk_Entity in the thunk. */ + tree gnu_target += gnat_to_gnu_entity (Thunk_Entity (gnat_thunk), NULL_TREE, false); + + /* If the target is local, then thunk and target must have the same context + because cgraph_node::expand_thunk can only forward the static chain. */ + if (DECL_STATIC_CHAIN (gnu_target) + && DECL_CONTEXT (gnu_thunk) != DECL_CONTEXT (gnu_target)) +return false; + /* There are three ways to retrieve the offset between the interface view and the base object. Either the controlling type covers the interface type and the offset of the corresponding tag is fixed, in which case it @@ -1,6 +11121,15 @@ maybe_make_gnu_thunk (Entity_Id gnat_thunk, tree gnu_thunk) virtual_value = 0; virtual_offset = NULL_TREE; indirect_offset = 0; + + /* Do not create a null thunk, instead make it an alias. */ + if (fixed_offset == 0) + { + SET_DECL_ASSEMBLER_NAME (gnu_thunk, DECL_ASSEMBLER_NAME (gnu_target)); + (void) cgraph_node::get_create (gnu_target); + (void) cgraph_node::create_alias (gnu_thunk, gnu_target); + return true; + } } else if (!gnu_interface_offset && !Is_Variable_Size_Record (gnat_controlling_type)) @@ -11132,16 +11151,6 @@ maybe_make_gnu_thunk (Entity_Id gnat_thunk, tree gnu_thunk) indirect_offset = (HOST_WIDE_INT) (POINTER_SIZE / BITS_PER_UNIT); } - /* But we generate a call to the Thunk_Entity in the thunk. */ - tree gnu_target -= gnat_to_gnu_entity (Thunk_Entity (gnat_thunk), NULL_TREE, false); - - /* If the target is local, then thunk and target must have the same context - because cgraph_node::expand_thunk can only forward the static chain. */ - if (DECL_STATIC_CHAIN (gnu_target) - && DECL_CONTEXT (gnu_thunk) != DECL_CONTEXT (gnu_target)) -return false; - /* If the target returns by invisible reference and is external, apply the same transformation as Subprogram_Body_to_gnu here. */ if (TREE_ADDRESSABLE (TREE_TYPE (gnu_target)) -- 2.45.1
[COMMITTED 10/16] ada: Bad tree built for Obj.Discrim_Dep_Component'Loop_Entry in assertion
From: Steve Baird The Etype for an N_Selected_Component node usually should not match the Etype of the referenced component if the component is subject to a discriminant-dependent constraint. Instead Build_Actual_Subtype_Of_Component should be called. Fix a case where this rule was not being followed (because B_A_S_O_C is not called during preanalysis of a component selection), resulting in a tree that confused CodePeer because the subtype was wrong. gcc/ada/ * exp_attr.adb (Expand_Loop_Entry_Attribute): Ensure that Etype of the saved expression is set correctly. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/exp_attr.adb | 25 ++--- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/gcc/ada/exp_attr.adb b/gcc/ada/exp_attr.adb index 1396007a2d1..5c85b4912d2 100644 --- a/gcc/ada/exp_attr.adb +++ b/gcc/ada/exp_attr.adb @@ -1780,14 +1780,25 @@ package body Exp_Attr is begin Aux_Decl := Empty; --- Generate a nominal type for the constant when the prefix is of --- a constrained type. This is achieved by setting the Etype of --- the relocated prefix to its base type. Since the prefix is now --- the initialization expression of the constant, its freezing --- will produce a proper nominal type. - Temp_Expr := Relocate_Node (Pref); -Set_Etype (Temp_Expr, Base_Typ); + +-- For Etype (Temp_Expr) in some cases we cannot use either +-- Etype (Pref) or Base_Typ. So we set Etype (Temp_Expr) to null +-- and mark Temp_Expr as requiring analysis. Rather than trying +-- to sort out exactly when this is needed, we do it +-- unconditionally. +-- One case where this is needed is when +-- 1) Pref is an N_Selected_Component name that +--refers to a component which is subject to a +--discriminant-dependent constraint; and +-- 2) The prefix of that N_Selected_Component refers to a +--formal parameter with an unconstrained subtype; and +-- 3) Pref has only been preanalyzed (so that +--Build_Actual_Subtype_Of_Component has not been called +--and Etype (Pref) equals the Etype of the component). + +Set_Etype (Temp_Expr, Empty); +Set_Analyzed (Temp_Expr, False); -- Generate: --Temp : constant Base_Typ := Pref; -- 2.45.1
[COMMITTED 12/16] ada: Typo and indentation fix
Fixes typo in comments and 2 instances of bad indentation. gcc/ada/ * gcc-interface/decl.cc (gnat_to_gnu_entity): Typo fix. (gnat_to_gnu_component_type): Indent fix. * gcc-interface/gigi.h (build_call_alloc_dealloc): Typo fix. * gcc-interface/utils.cc (make_dummy_type): Typo fix. * gcc-interface/utils2.cc (gnat_protect_expr): Indent fix. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/gcc-interface/decl.cc | 8 gcc/ada/gcc-interface/gigi.h| 2 +- gcc/ada/gcc-interface/utils.cc | 2 +- gcc/ada/gcc-interface/utils2.cc | 2 +- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/gcc/ada/gcc-interface/decl.cc b/gcc/ada/gcc-interface/decl.cc index 8b72c96c439..23983742605 100644 --- a/gcc/ada/gcc-interface/decl.cc +++ b/gcc/ada/gcc-interface/decl.cc @@ -1384,7 +1384,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition) volatile_flag = false; gnu_size = NULL_TREE; - /* In case this was a aliased object whose nominal subtype is + /* In case this was an aliased object whose nominal subtype is unconstrained, the pointer above will be a thin pointer and build_allocator will automatically make the template. @@ -2103,7 +2103,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition) 1. the array type (suffix XUA) containing the actual data, - 2. the template type (suffix XUB) containng the bounds, + 2. the template type (suffix XUB) containing the bounds, 3. the fat pointer type (suffix XUP) representing a pointer or a reference to the unconstrained array type: @@ -5445,8 +5445,8 @@ gnat_to_gnu_component_type (Entity_Id gnat_array, bool definition, if (gnu_comp_align > TYPE_ALIGN (gnu_type)) gnu_comp_align = 0; } - else -gnu_comp_align = 0; + else + gnu_comp_align = 0; gnu_type = maybe_pad_type (gnu_type, gnu_comp_size, gnu_comp_align, gnat_array, true, definition, true); diff --git a/gcc/ada/gcc-interface/gigi.h b/gcc/ada/gcc-interface/gigi.h index f3205a8a25d..6ed74d6879e 100644 --- a/gcc/ada/gcc-interface/gigi.h +++ b/gcc/ada/gcc-interface/gigi.h @@ -906,7 +906,7 @@ extern tree build_call_alloc_dealloc (tree gnu_obj, tree gnu_size, Entity_Id gnat_pool, Node_Id gnat_node); /* Build a GCC tree to correspond to allocating an object of TYPE whose - initial value if INIT, if INIT is nonzero. Convert the expression to + initial value is INIT, if INIT is nonzero. Convert the expression to RESULT_TYPE, which must be some type of pointer. Return the tree. GNAT_PROC and GNAT_POOL optionally give the procedure to call and diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc index ae520542ace..771cb1a17ca 100644 --- a/gcc/ada/gcc-interface/utils.cc +++ b/gcc/ada/gcc-interface/utils.cc @@ -499,7 +499,7 @@ make_dummy_type (Entity_Id gnat_type) if (No (gnat_equiv)) gnat_equiv = gnat_type; - /* If it there already a dummy type, use that one. Else make one. */ + /* If there is already a dummy type, use that one. Else make one. */ if (PRESENT_DUMMY_NODE (gnat_equiv)) return GET_DUMMY_NODE (gnat_equiv); diff --git a/gcc/ada/gcc-interface/utils2.cc b/gcc/ada/gcc-interface/utils2.cc index 4b7e2739f6a..70271cf2836 100644 --- a/gcc/ada/gcc-interface/utils2.cc +++ b/gcc/ada/gcc-interface/utils2.cc @@ -2884,7 +2884,7 @@ gnat_protect_expr (tree exp) if (code == NON_LVALUE_EXPR || CONVERT_EXPR_CODE_P (code) || code == VIEW_CONVERT_EXPR) - return build1 (code, type, gnat_protect_expr (TREE_OPERAND (exp, 0))); +return build1 (code, type, gnat_protect_expr (TREE_OPERAND (exp, 0))); /* If we're indirectly referencing something, we only need to protect the address since the data itself can't change in these situations. */ -- 2.45.1
[COMMITTED 11/16] ada: Fix parts of classification of aspects
From: Eric Botcazou Many aspects are (correctly) marked as GNAT-specific but nevertheless not listed in the Implementation_Defined_Aspect array, so this aligns the two sides and also removes Default_Initial_Condition and Object_Size from the list, since they are defined in Ada 2022. This also moves No_Controlled_Parts and No_Task_Parts to the subclass of boolean aspects, and completes the list of nonoverridable aspects defined in Ada 2022. gcc/ada/ * aspects.ads (Aspect_Id): Alphabetize, remove the GNAT tag from Default_Initial_Condition and Object_Size, move No_Controlled_Parts and No_Task_Parts to boolean subclass. (Nonoverridable_Aspect_Id): Add missing Ada 2022 aspects. (Implementation_Defined_Aspect): Add all missing aspects, remove Max_Entry_Queue_Length and Object_Size (Aspect_Argument): Remove specific entries for No_Controlled_Parts and No_Task_Parts, list boolean aspects last. (Is_Representation_Aspect ): Move boolean aspects last. (Aspect_Names): Alphabetize. * sem_ch13.adb (Analyze_Aspect_Disable_Controlled): Adjust. (Analyze_Aspect_Specifications): Move around processing for No_Controlled_Parts and No_Task_Parts. (Check_Aspect_At_Freeze_Point): Remove specific entries for No_Controlled_Parts and No_Task_Parts Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/aspects.ads | 94 gcc/ada/sem_ch13.adb | 69 +++- 2 files changed, 101 insertions(+), 62 deletions(-) diff --git a/gcc/ada/aspects.ads b/gcc/ada/aspects.ads index d4aafb1a4f1..202d42193d1 100644 --- a/gcc/ada/aspects.ads +++ b/gcc/ada/aspects.ads @@ -64,10 +64,14 @@ with Types; use Types; package Aspects is - -- Type defining recognized aspects + -- Type enumerating the recognized aspects. The GNAT tag must be in keeping + -- with the Implementation_Defined_Aspect array below. type Aspect_Id is (No_Aspect,-- Dummy entry for no aspect + + -- The following aspects do not have a (static) boolean value + Aspect_Abstract_State,-- GNAT Aspect_Address, Aspect_Aggregate, @@ -81,7 +85,7 @@ package Aspects is Aspect_Convention, Aspect_CPU, Aspect_Default_Component_Value, - Aspect_Default_Initial_Condition, -- GNAT + Aspect_Default_Initial_Condition, Aspect_Default_Iterator, Aspect_Default_Storage_Pool, Aspect_Default_Value, @@ -104,8 +108,8 @@ package Aspects is Aspect_Integer_Literal, Aspect_Interrupt_Priority, Aspect_Invariant, -- GNAT - Aspect_Iterator_Element, Aspect_Iterable, -- GNAT + Aspect_Iterator_Element, Aspect_Link_Name, Aspect_Linker_Section,-- GNAT Aspect_Local_Restrictions,-- GNAT @@ -113,9 +117,7 @@ package Aspects is Aspect_Max_Entry_Queue_Depth, -- GNAT Aspect_Max_Entry_Queue_Length, Aspect_Max_Queue_Length, -- GNAT - Aspect_No_Controlled_Parts, - Aspect_No_Task_Parts, -- GNAT - Aspect_Object_Size, -- GNAT + Aspect_Object_Size, Aspect_Obsolescent, -- GNAT Aspect_Output, Aspect_Part_Of, -- GNAT @@ -186,10 +188,10 @@ package Aspects is Aspect_Atomic, Aspect_Atomic_Components, Aspect_Constant_After_Elaboration,-- GNAT - Aspect_Disable_Controlled,-- GNAT - Aspect_Discard_Names, Aspect_CUDA_Device, -- GNAT Aspect_CUDA_Global, -- GNAT + Aspect_Disable_Controlled,-- GNAT + Aspect_Discard_Names, Aspect_Effective_Reads, -- GNAT Aspect_Effective_Writes, -- GNAT Aspect_Exclusive_Functions, @@ -206,9 +208,11 @@ package Aspects is Aspect_Interrupt_Handler, Aspect_Lock_Free, -- GNAT Aspect_No_Caching,-- GNAT + Aspect_No_Controlled_Parts, Aspect_No_Inline, -- GNAT Aspect_No_Return, Aspect_No_Tagged_Streams, -- GNAT + Aspect_No_Task_Parts, -- GNAT Aspect_Pack, Aspect_Persistent_BSS,-- GNAT Aspect_Preelaborable_Initialization, @@ -242,12 +246,13 @@ package Aspects is | Aspect_Constant_Indexing | Aspect_Default_Iterator | Aspect_Implicit_Dereference + | Aspect_Integer_Literal | Aspect_Iterator_Element | Aspect_Max_Entry_Queue_Length | A
[COMMITTED 16/16] ada: Do not include target-specific makefile fragments
From: Eric Botcazou They are unused in this context. gcc/ada/ * gcc-interface/Makefile.in (tmake_file): Remove all references. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/gcc-interface/Makefile.in | 6 -- 1 file changed, 6 deletions(-) diff --git a/gcc/ada/gcc-interface/Makefile.in b/gcc/ada/gcc-interface/Makefile.in index 0666fc00bb8..29db89c6f52 100644 --- a/gcc/ada/gcc-interface/Makefile.in +++ b/gcc/ada/gcc-interface/Makefile.in @@ -148,7 +148,6 @@ host_vendor=@host_vendor@ host_os=@host_os@ target_cpu_default = @target_cpu_default@ xmake_file = @xmake_file@ -tmake_file = @tmake_file@ #version=`sed -e 's/.*\"\([^ \"]*\)[ \"].*/\1/' < $(srcdir)/version.c` #mainversion=`sed -e 's/.*\"\([0-9]*\.[0-9]*\).*/\1/' < $(srcdir)/version.c` @@ -209,11 +208,6 @@ all: all.indirect # This tells GNU Make version 3 not to put all variables in the environment. .NOEXPORT: -# target overrides -ifneq ($(tmake_file),) -include $(tmake_file) -endif - # host overrides ifneq ($(xmake_file),) include $(xmake_file) -- 2.45.1
[COMMITTED 07/16] ada: Add prototype for mutably tagged types
From: Justin Squirek This patch implements mutably tagged types via the new Size'Class aspect. gcc/ada/ * doc/gnat_rm/gnat_language_extensions.rst: Add documentation for mutably tagged type feature. * aspects.ads: Add registration for 'Size'Class. * einfo.ads: Add documentation for new components Class_Wide_Equivalent_Type and Is_Mutably_Tagged_Type. * exp_aggr.adb (Gen_Assign): Assume associated mutably tagged type when class-wide equivalent type is encountered. (Contains_Mutably_Tagged_Type): New subprogram. (Convert_To_Positional): Assume associated mutably tagged type when class-wide equivalent type is encountered. (Is_Static_Element): Assume associated mutably tagged type when class-wide equivalent type is encountered. (Expand_Array_Aggregate): Assume associated mutably tagged type when class-wide equivalent type is encountered. (Expand_Record_Aggregate): Force mutably tagged records to be expanded into assignments. * exp_ch3.adb (Build_Array_Init_Proc): Assume associated mutably tagged type when class-wide equivalent type is encountered. (Simple_Initialization_OK): Disallow simple initialization for class-wide equivalent types. (Build_Init_Statements): Assume associated mutably tagged type when class-wide equivalent type is encountered. (Expand_Freeze_Array_Type): Ignore building of record init procs for mutably tagged types. (Expand_N_Full_Type_Declaration): Replace mutably tagged type declarations with their associated class-wide equivalent types. (Default_Initialize_Object): Add special handling for mutably tagged types. * exp_ch4.adb (Expand_N_Allocator): Add initialization for mutably tagged types. (Expand_Record_Equality): Generate mutably tagged unchecked conversions. * exp_ch5.adb (Expand_N_Assignment_Statement): Generate a special assignment case for class-wide equivalent types which does tag assignments and ignores certain checks. * exp_ch6.adb (Expand_Call_Helper): Propagate constrained extra formal actuals for mutably tagged types. * exp_ch7.adb (Make_Init_Call): Handle mutably tagged type initialization. * exp_util.adb (Make_CW_Equivalent_Type): Modify to handle mutably tagged objects which contain no initialization expression. (Make_Subtype_From_Expr): Modify call to Make_CW_Equivalent_Type. * exp_util.ads (Make_CW_Equivalent_Type): Move declaration from body to spec. * freeze.adb (Size_Known): No longer return false automatically when a class-wide type is encountered. (Freeze_Entity): Ignore error messages about size not being known for mutably tagged types. * gen_il-fields.ads: Register new fields Class_Wide_Equivalent_Type and Is_Mutably_Tagged_Type. * gen_il-gen-gen_entities.adb: Register new fields Class_Wide_Equivalent_Type and Is_Mutably_Tagged_Type for type entities. * mutably_tagged.adb, mutably_tagged.ads (Corresponding_Mutably_Tagged_Type): New subprogram. (Depends_On_Mutably_Tagged_Ext_Comp): New subprogram. (Get_Corresponding_Mutably_Tagged_Type_If_Present): New subprogram. (Get_Corresponding_Tagged_Type_If_Present): New subprogram. (Is_Mutably_Tagged_Conversion): New subprogram. (Is_Mutably_Tagged_CW_Equivalent_Type): New subprogram. (Make_Mutably_Tagged_Conversion): New subprogram. (Make_CW_Size_Compile_Check): New subprogram. (Make_Mutably_Tagged_CW_Check): New subprogram. * sem_aggr.adb (Resolve_Array_Aggregate): Skip tag checks for class-wide equivalent types. (Resolve_Aggr_Expr): Assume associated mutably tagged type when class-wide equivalent type is encountered. * sem_attr.adb (Analyze_Attribute): Allow 'Tag on mutably tagged types. (Resolve_Attribute): Detect errors for dependence of mutably tagged extension type component. * sem_ch12.adb (Instantiate_Object): Detect errors for dependence of mutably tagged extension type component. * sem_ch13.adb (Analyze_One_Aspect): Propagate 'Size'Class to class-wide type. (Analyze_Attribute_Definition_Clause): Add handling of 'Size'Class by generating class-wide equivalent types and checking for illegal uses. * sem_ch2.adb (Analyze_Identifier): Generate unchecked conversion for class-wide equivalent types. * sem_ch3.adb (Analyze_Component_Declaration): Avoid unconstrained errors on mutably tagged types. (Analyze_Object_Declaration): Rewrite declarations of mutably tagged types to use class-wide equivalent types. (Array_Type_Declaration): Modify arra
[COMMITTED 15/16] ada: Fix return mechanism reported by -gnatRm
From: Eric Botcazou The return mechanism of functions is reported when the -gnatRm switch is specified, but it is incorrect when the result type is not a by-reference type in the language sense but is nevertheless returned by reference. gcc/ada/ * gcc-interface/decl.cc: Include function.h. (gnat_to_gnu_param): Minor comment tweaks. (gnat_to_gnu_subprog_type): Take into account the default for the computation of the return mechanism. Give a warning if a by-copy specified mechanism cannot be honored. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/gcc-interface/decl.cc | 34 +++--- 1 file changed, 27 insertions(+), 7 deletions(-) diff --git a/gcc/ada/gcc-interface/decl.cc b/gcc/ada/gcc-interface/decl.cc index 23983742605..aa31a18 100644 --- a/gcc/ada/gcc-interface/decl.cc +++ b/gcc/ada/gcc-interface/decl.cc @@ -27,6 +27,7 @@ #include "system.h" #include "coretypes.h" #include "target.h" +#include "function.h" #include "tree.h" #include "gimple-expr.h" #include "stringpool.h" @@ -5703,6 +5704,7 @@ gnat_to_gnu_param (Entity_Id gnat_param, tree gnu_param_type, bool first, input_location = saved_location; + /* Warn if we are asked to pass by copy but cannot. */ if (mech == By_Copy && (by_ref || by_component_ptr)) post_error ("??cannot pass & by copy", gnat_param); @@ -5735,12 +5737,13 @@ gnat_to_gnu_param (Entity_Id gnat_param, tree gnu_param_type, bool first, DECL_RESTRICTED_ALIASING_P (gnu_param) = restricted_aliasing_p; Sloc_to_locus (Sloc (gnat_param), &DECL_SOURCE_LOCATION (gnu_param)); - /* If no Mechanism was specified, indicate what we're using, then - back-annotate it. */ + /* If no Mechanism was specified, indicate what we will use. */ if (mech == Default) mech = (by_ref || by_component_ptr) ? By_Reference : By_Copy; + /* Back-annotate the mechanism in all cases. */ Set_Mechanism (gnat_param, mech); + return gnu_param; } @@ -6129,11 +6132,6 @@ gnat_to_gnu_subprog_type (Entity_Id gnat_subprog, bool definition, associate_subprog_with_dummy_type (gnat_subprog, gnu_return_type); incomplete_profile_p = true; } - - if (kind == E_Function) - Set_Mechanism (gnat_subprog, return_by_direct_ref_p -|| return_by_invisi_ref_p -? By_Reference : By_Copy); } /* A procedure (something that doesn't return anything) shouldn't be @@ -6636,6 +6634,28 @@ gnat_to_gnu_subprog_type (Entity_Id gnat_subprog, bool definition, if (warn_shadow) post_error ("'G'C'C builtin not found for&!??", gnat_subprog); } + + /* Finally deal with the return mechanism for a function. */ + if (kind == E_Function) + { + /* We return by reference either if this is required by the semantics +of the language or if this is the default for the function. */ + const bool by_ref = return_by_direct_ref_p + || return_by_invisi_ref_p + || aggregate_value_p (gnu_return_type, gnu_type); + Mechanism_Type mech = Mechanism (gnat_subprog); + + /* Warn if we are asked to return by copy but cannot. */ + if (mech == By_Copy && by_ref) + post_error ("??cannot return from & by copy", gnat_subprog); + + /* If no mechanism was specified, indicate what we will use. */ + if (mech == Default) + mech = by_ref ? By_Reference : By_Copy; + + /* Back-annotate the mechanism in all cases. */ + Set_Mechanism (gnat_subprog, mech); + } } *param_list = gnu_param_list; -- 2.45.1
[COMMITTED 14/16] ada: Skip subprogram body entities inside scopes
From: Yannick Moy Entities of kind E_Subprogram_Body, used on bodies of subprograms for which there is a separate declaration, have been added in the entities linked from a scope in order to get the representation information on their enclosed object and type declarations. Skip these entities in gigi. gcc/ada/ * gcc-interface/trans.cc (elaborate_all_entities_for_package) (process_freeze_entity): Skip entities of kind E_Subprogram_Body. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/gcc-interface/trans.cc | 8 1 file changed, 8 insertions(+) diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc index 5256095dfeb..e68fb3fd776 100644 --- a/gcc/ada/gcc-interface/trans.cc +++ b/gcc/ada/gcc-interface/trans.cc @@ -9321,6 +9321,10 @@ elaborate_all_entities_for_package (Entity_Id gnat_package) if (kind == E_Package_Body) continue; + /* Skip subprogram bodies. */ + if (kind == E_Subprogram_Body) + continue; + /* Skip limited views that point back to the main unit. */ if (IN (kind, Incomplete_Kind) && From_Limited_With (gnat_entity) @@ -9427,6 +9431,10 @@ process_freeze_entity (Node_Id gnat_node) if (Is_Subprogram (gnat_entity) && Present (Interface_Alias (gnat_entity))) return; + /* Skip subprogram bodies. */ + if (kind == E_Subprogram_Body) +return; + /* Check for an old definition if this isn't an object with address clause, since the saved GCC tree is the address expression in that case. */ gnu_old -- 2.45.1
Re: [Patch, fortran] PR59104
Hi Paul, to me this looks fine. Thanks for the patch. Me having been away for some time from gfortran, I recommend you wait for Harald's ok, too. Regards, Andre On Thu, 13 Jun 2024 22:43:03 +0100 Paul Richard Thomas wrote: > Hi Both, > > Thanks for the highly constructive comments. I think that I have > incorporated them fully in the attached. > > OK for mainline and ...? > > Paul > > > On Mon, 10 Jun 2024 at 08:19, Andre Vehreschild wrote: > > > Hi Paul, > > > > while looking at your patch I see calls to gfc_add_init_cleanup (..., > > back), > > while the function signature is gfc_add_init_cleanup (..., bool front). > > This > > slightly confuses me. I would at least expect to see > > gfc_add_init_cleanup(..., > > !back) calls. Just to get the semantics right. > > > > Then I wonder why not doing: > > > > diff --git a/gcc/fortran/dependency.cc b/gcc/fortran/dependency.cc > > index bafe8cbc5bc..97ace8c778e 100644 > > --- a/gcc/fortran/dependency.cc > > +++ b/gcc/fortran/dependency.cc > > @@ -2497,3 +2497,63 @@ gfc_omp_expr_prefix_same (gfc_expr *lexpr, gfc_expr > > *rexpr) > >return true; > > } > > + > > + > > +/* gfc_function_dependency returns true for non-dummy symbols with > > dependencies > > + on an old-fashioned function result (ie. proc_name = > > proc_name->result). > > + This is used to ensure that initialization code appears after the > > function > > + result is treated and that any mutual dependencies between these > > symbols are > > + respected. */ > > + > > +static bool > > +dependency_fcn (gfc_expr *e, gfc_symbol *sym, > > +int *f ATTRIBUTE_UNUSED) > > +{ > > + return (e && e->expr_type == EXPR_VARIABLE > > + && e->symtree > > + && e->symtree->n.sym == sym); > > +} > > > > Instead of the multiple if-statements? > > > > + > > +bool > > +gfc_function_dependency (gfc_symbol *sym, gfc_symbol *proc_name) > > +{ > > + bool front = false; > > + > > + if (proc_name && proc_name->attr.function > > + && proc_name == proc_name->result > > + && !(sym->attr.dummy || sym->attr.result)) > > +{ > > + if (sym->as && sym->as->type == AS_EXPLICIT) > > + { > > + for (int dim = 0; dim < sym->as->rank; dim++) > > + { > > + if (sym->as->lower[dim] > > + && sym->as->lower[dim]->expr_type != EXPR_CONSTANT) > > + front = gfc_traverse_expr (sym->as->lower[dim], proc_name, > > + dependency_fcn, 0); > > + if (front) > > + break; > > + if (sym->as->upper[dim] > > + && sym->as->upper[dim]->expr_type != EXPR_CONSTANT) > > + front = gfc_traverse_expr (sym->as->upper[dim], proc_name, > > + dependency_fcn, 0); > > + if (front) > > + break; > > + } > > + } > > + > > + if (sym->ts.type == BT_CHARACTER > > + && sym->ts.u.cl && sym->ts.u.cl->length > > + && sym->ts.u.cl->length->expr_type != EXPR_CONSTANT) > > + front = gfc_traverse_expr (sym->ts.u.cl->length, proc_name, > > + dependency_fcn, 0); > > > > This can overwrite a previous front == true, right? Is this intended? > > > > +} > > + return front; > > + } > > > > The rest - besides the front-back confusion - looks fine to me. Thanks for > > the > > patch. > > > > Regards, > > Andre > > > > On Sun, 9 Jun 2024 07:14:39 +0100 > > Paul Richard Thomas wrote: > > > > > Hi All, > > > > > > The attached fixes a problem that, judging by the comments, has been > > looked > > > at periodically over the last ten years but just looked to be too > > > fiendishly complicated to fix. This is not in small part because of the > > > confusing ordering of dummies in the tlink chain and the unintuitive > > > placement of all deferred initializations to the front of the init chain > > in > > > the wrapped block. > > > > > > The result of the existing ordering is that the initialization code for > > > non-dummy variables that depends on the function result occurs before any > > > initialization code for the function result itself. The fix ensures that: > > > (i) These variables are placed correctly in the tlink chain, respecting > > > inter-dependencies; and (ii) The dependent initializations are placed at > > > the end of the wrapped block init chain. The details appear in the > > > comments in the patch. It is entirely possible that a less clunky fix > > > exists but I failed to find it. > > > > > > OK for mainline? > > > > > > Regards > > > > > > Paul > > > > > > -- > > Andre Vehreschild * Email: vehre ad gmx dot de > > -- Andre Vehreschild * Email: vehre ad gmx dot de
[pushed] doc: Remove reference to Interix
This apparently was missed when support for Interix was removed in 2016. gcc: PR target/69374 * doc/install.texi (Specific): Remove stale reference to Interix. --- gcc/doc/install.texi | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index 0baba5e594d..38c93f067b5 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -5178,8 +5178,7 @@ SuperH (sh-wince-pe), and MIPS (mips-wince-pe). @subheading Other Windows Platforms GCC no longer supports Windows NT on the Alpha or PowerPC. -GCC no longer supports the Windows POSIX subsystem. However, it does -support the Interix subsystem. See above. +GCC no longer supports the Windows POSIX subsystem. Old target names including *-*-winnt and *-*-windowsnt are no longer used. -- 2.45.2
RE: [PATCH v1] RISC-V: Bugfix vec_extract v mode iterator restriction mismatch
Committed, thanks Kito. Pan -Original Message- From: Kito Cheng Sent: Friday, June 14, 2024 3:33 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] RISC-V: Bugfix vec_extract v mode iterator restriction mismatch LGTM, thanks :) On Fri, Jun 14, 2024 at 3:02 PM wrote: > > From: Pan Li > > We have vec_extract pattern which takes ZVFHMIN as the mode > iterator of the V mode. Aka VF_ZVFHMIN iterator. But it will > expand to pred_extract_first pattern which takes the ZVFH as the mode > iterator of the V mode. AKa VF. The mismatch will result in one ICE > similar as below: > > insn 30 29 31 2 (set (reg:HF 156 [ _2 ]) > (unspec:HF [ > (vec_select:HF (reg:RVVMF2HF 134 [ _1 ]) > (parallel [ > (const_int 0 [0]) > ])) > (reg:SI 67 vtype) > ] UNSPEC_VPREDICATE)) "compress_run-2.c":22:3 -1 > (nil)) > during RTL pass: vregs > compress_run-2.c:25:1: internal compiler error: in extract_insn, at > recog.cc:2812 > 0xb3bc47 _fatal_insn(char const*, rtx_def const*, char const*, int, char > const*) > ../../../gcc/gcc/rtl-error.cc:108 > 0xb3bc69 _fatal_insn_not_found(rtx_def const*, char const*, int, char > const*) > ../../../gcc/gcc/rtl-error.cc:116 > 0xb3a545 extract_insn(rtx_insn*) > ../../../gcc/gcc/recog.cc:2812 > 0x1010e9e instantiate_virtual_regs_in_insn > ../../../gcc/gcc/function.cc:1612 > 0x1010e9e instantiate_virtual_regs > ../../../gcc/gcc/function.cc:1995 > 0x1010e9e execute > ../../../gcc/gcc/function.cc:2042 > > The below test suites are passed for this patch. > 1. The rv64gcv fully regression test. > 2. The rv64gcv build with glibc. > > There may be other similar issue(s) for the mismatch, we will take care > of them by test cases one by one. > > PR target/115456 > > gcc/ChangeLog: > > * config/riscv/vector-iterators.md: Leverage V_ZVFH instead of V > which contains the VF_ZVFHMIN for alignment. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/base/pr115456-2.c: New test. > * gcc.target/riscv/rvv/base/pr115456-3.c: New test. > > Signed-off-by: Pan Li > --- > gcc/config/riscv/vector-iterators.md | 4 ++- > .../gcc.target/riscv/rvv/base/pr115456-2.c| 31 +++ > .../gcc.target/riscv/rvv/base/pr115456-3.c| 31 +++ > 3 files changed, 65 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c > > diff --git a/gcc/config/riscv/vector-iterators.md > b/gcc/config/riscv/vector-iterators.md > index 47392d0da4c..43137a2a379 100644 > --- a/gcc/config/riscv/vector-iterators.md > +++ b/gcc/config/riscv/vector-iterators.md > @@ -1578,9 +1578,11 @@ (define_mode_iterator VLS_ZVFH [VLSI VLSF]) > > (define_mode_iterator V [VI VF_ZVFHMIN]) > > +(define_mode_iterator V_ZVFH [VI VF]) > + > (define_mode_iterator V_VLS [V VLS]) > > -(define_mode_iterator V_VLS_ZVFH [V VLS_ZVFH]) > +(define_mode_iterator V_VLS_ZVFH [V_ZVFH VLS_ZVFH]) > > (define_mode_iterator V_VLSI [VI VLSI]) > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c > b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c > new file mode 100644 > index 000..453e18b1c79 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c > @@ -0,0 +1,31 @@ > +/* Test there is no ICE when compile. */ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv64gcv_zvfhmin -mrvv-vector-bits=zvl -mabi=lp64d > -O3 -ftree-vectorize" } */ > + > +#include > +#include > + > +typedef _Float16 vnx4f __attribute__ ((vector_size (8))); > + > +vnx4f __attribute__ ((noinline, noclone)) > +test_5 (vnx4f x, vnx4f y) > +{ > + return __builtin_shufflevector (x, y, 1, 3, 6, 7); > +} > + > +int > +main (void) > +{ > + vnx4f test_5_x = {0, 1, 3, 4}; > + vnx4f test_5_y = {4, 5, 6, 7}; > + vnx4f test_5_except = {1, 4, 6, 7}; > + vnx4f test_5_real; > + test_5_real = test_5 (test_5_x, test_5_y); > + > + for (int i = 0; i < 4; i++) > +assert (test_5_real[i] == test_5_except[i]); > + > + return 0; > +} > + > +/* { dg-final { scan-assembler-times {call\s+__extendhfsf2} 8 } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c > b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c > new file mode 100644 > index 000..2c54f1d7538 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c > @@ -0,0 +1,31 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 -ftree-vectorize" } */ > + > +#include > +#include > + > +typedef _Float16 vnx4f __attribute__ ((vector_size (8))); > + > +vnx4f __attribute__ ((noinline, noclone)) > +test_5 (vnx4f x, vnx4f y) > +{ > + return __builtin_shufflev
Re: [PATCH v1] Match: Support more forms for the scalar unsigned .SAT_SUB
On Wed, Jun 12, 2024 at 2:38 PM wrote: > > From: Pan Li > > After we support the scalar unsigned form 1 and 2, we would like > to introduce more forms include the branch and branchless. There > are forms 3-10 list as below: > > Form 3: > #define SAT_SUB_U_3(T) \ > T sat_sub_u_3_##T (T x, T y) \ > { \ > return x > y ? x - y : 0; \ > } > > Form 4: > #define SAT_SUB_U_4(T) \ > T sat_sub_u_4_##T (T x, T y) \ > { \ > return x >= y ? x - y : 0; \ > } > > Form 5: > #define SAT_SUB_U_5(T) \ > T sat_sub_u_5_##T (T x, T y) \ > { \ > return x < y ? 0 : x - y; \ > } > > Form 6: > #define SAT_SUB_U_6(T) \ > T sat_sub_u_6_##T (T x, T y) \ > { \ > return x <= y ? 0 : x - y; \ > } > > Form 7: > #define SAT_SUB_U_7(T) \ > T sat_sub_u_7_##T (T x, T y) \ > { \ > T ret; \ > T overflow = __builtin_sub_overflow (x, y, &ret); \ > return ret & (T)(overflow - 1); \ > } > > Form 8: > #define SAT_SUB_U_8(T) \ > T sat_sub_u_8_##T (T x, T y) \ > { \ > T ret; \ > T overflow = __builtin_sub_overflow (x, y, &ret); \ > return ret & (T)-(!overflow); \ > } > > Form 9: > #define SAT_SUB_U_9(T) \ > T sat_sub_u_9_##T (T x, T y) \ > { \ > T ret; \ > T overflow = __builtin_sub_overflow (x, y, &ret); \ > return overflow ? 0 : ret; \ > } > > Form 10: > #define SAT_SUB_U_10(T) \ > T sat_sub_u_10_##T (T x, T y) \ > { \ > T ret; \ > T overflow = __builtin_sub_overflow (x, y, &ret); \ > return !overflow ? ret : 0; \ > } > > Take form 10 as example: > > SAT_SUB_U_10(uint64_t); > > Before this patch: > uint8_t sat_sub_u_10_uint8_t (uint8_t x, uint8_t y) > { > unsigned char _1; > unsigned char _2; > uint8_t _3; > __complex__ unsigned char _6; > > ;; basic block 2, loop depth 0 > ;;pred: ENTRY > _6 = .SUB_OVERFLOW (x_4(D), y_5(D)); > _2 = IMAGPART_EXPR <_6>; > if (_2 == 0) > goto ; [50.00%] > else > goto ; [50.00%] > ;;succ: 3 > ;;4 > > ;; basic block 3, loop depth 0 > ;;pred: 2 > _1 = REALPART_EXPR <_6>; > ;;succ: 4 > > ;; basic block 4, loop depth 0 > ;;pred: 2 > ;;3 > # _3 = PHI <0(2), _1(3)> > return _3; > ;;succ: EXIT > > } > > After this patch: > uint8_t sat_sub_u_10_uint8_t (uint8_t x, uint8_t y) > { > uint8_t _3; > > ;; basic block 2, loop depth 0 > ;;pred: ENTRY > _3 = .SAT_SUB (x_4(D), y_5(D)); [tail call] > return _3; > ;;succ: EXIT > > } > > The below test suites are passed for this patch: > 1. The rv64gcv fully regression test with newlib. > 2. The rv64gcv build with glibc. > 3. The x86 bootstrap test. > 4. The x86 fully regression test. > > gcc/ChangeLog: > > * match.pd: Add more match for unsigned sat_sub. > * tree-ssa-math-opts.cc (match_unsigned_saturation_sub): Add new > func impl to match phi node for .SAT_SUB. > (math_opts_dom_walker::after_dom_children): Try match .SAT_SUB > for the phi node, MULT_EXPR, BIT_XOR_EXPR and BIT_AND_EXPR. > > Signed-off-by: Pan Li > --- > gcc/match.pd | 25 +++-- > gcc/tree-ssa-math-opts.cc | 33 + > 2 files changed, 56 insertions(+), 2 deletions(-) > > diff --git a/gcc/match.pd b/gcc/match.pd > index 5cfe81e80b3..66e411b3359 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3140,14 +3140,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > /* Unsigned saturation sub, case 1 (branch with gt): > SAT_U_SUB = X > Y ? X - Y : 0 */ > (match (unsigned_integer_sat_sub @0 @1) > - (cond (gt @0 @1) (minus @0 @1) integer_zerop) > + (cond^ (gt @0 @1) (minus @0 @1) integer_zerop) > (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) >&& types_match (type, @0, @1 > > /* Unsigned saturation sub, case 2 (branch with ge): > SAT_U_SUB = X >= Y ? X - Y : 0. */ > (match (unsigned_integer_sat_sub @0 @1) > - (cond (ge @0 @1) (minus @0 @1) integer_zerop) > + (cond^ (ge @0 @1) (minus @0 @1) integer_zerop) > (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) >&& types_match (type, @0, @1 > > @@ -3165,6 +3165,27 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) >&& types_match (type, @0, @1 > > +/* Unsigned saturation sub, case 5 (branchless bit_and with .SUB_OVERFLOW. > */ > +(match (unsigned_integer_sat_sub @0 @1) > + (bit_and:c (realpart (IFN_SUB_OVERFLOW@2 @0 @1)) > + (plus:c (imagpart @2) integer_minus_onep)) :c shouldn't be necessary on the plus > + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) > + && types_match (type, @0, @1 > + > +/* Unsigned saturation sub, case 6 (branchless mult with .SUB_OVERFLOW. */ > +(match (unsigned_integer_sat_sub @0 @1) > + (mult:c (realpart (IFN_SUB_OVERFLOW@2 @0 @1)) > + (bit_xor:c (imagpart @2) integer_onep)) or on the bit_xor OK with those changes. Richard. > + (if (INTEGRAL_TYPE_P (
Re: [PATCH 0/3] [APX CFCMOV] Support APX CFCMOV
On Fri, Jun 14, 2024 at 8:58 AM Liu, Hongtao wrote: > > > > > -Original Message- > > From: Richard Biener > > Sent: Friday, June 14, 2024 2:52 PM > > To: Kong, Lingling > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ; Uros > > Bizjak > > Subject: Re: [PATCH 0/3] [APX CFCMOV] Support APX CFCMOV > > > > On Fri, Jun 14, 2024 at 5:12 AM Kong, Lingling > > wrote: > > > > > > APX CFCMOV[1] feature implements conditionally faulting which means > > > that all memory faults are suppressed when the condition code > > > evaluates to false and load or store a memory operand. Now we could load > > or store a memory operand may trap or fault for conditional move. > > > > > > In middle-end, now we don't support a conditional move if we knew that a > > load from A or B could trap or fault. > > > > What's the cost of suppressing a fault? ISTR that for example fault > > suppression > > for vector masked load/store is quite expensive, so when this is for example > Yes, avx512 masked load/store, the cost is expensive when memory is invalid. > > done in a loop where there's always a fault that's suppressed you can see > > 1000-fold slowdown. I would suspect this is similar for cfcmov? So how is > > this > > reflected in the decision to if-convert? > But for APXF, we were told the cost of invalid memory is as cheap as valid > ones. > (Why else would this instructions be designed?) Well - I wondered about this for the AVX512 case, so this isn't a good reason to expect it to be any better for APXF ;) But if you have confirmation this is to be expected (I would expect the silicon design with APX is finished at this point, even if actual hardware is still 2-3 years out), then fine - consider me positively surprised ;) Richard. > > > > > To enable CFCMOV, we add a target HOOK > > > TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP > > > in if-conversion pass to allow convert to cmov. > > > > > > All the changes passed bootstrap & regtest x86-64-pc-linux-gnu. > > > We also tested spec with SDE and passed the runtime test. > > > > > > Ok for trunk? > > > > > > [1].https://www.intel.com/content/www/us/en/developer/articles/technic > > > al/advanced-performance-extensions-apx.html > > > > > > Lingling Kong (3): > > > [APX CFCMOV] Add a new target hook: > > TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP > > > [APX CFCMOV] Support APX CFCMOV in if_convert pass > > > [APX CFCMOV] Support APX CFCMOV in backend > > > > > > gcc/config/i386/i386-expand.cc | 63 + > > > gcc/config/i386/i386-opts.h | 4 +- > > > gcc/config/i386/i386.cc | 33 ++- > > > gcc/config/i386/i386.h | 1 + > > > gcc/config/i386/i386.md | 53 +++- > > > gcc/config/i386/i386.opt | 3 + > > > gcc/config/i386/predicates.md| 7 + > > > gcc/doc/tm.texi | 6 + > > > gcc/doc/tm.texi.in | 2 + > > > gcc/ifcvt.cc | 247 ++- > > > gcc/target.def | 11 + > > > gcc/targhooks.cc | 8 + > > > gcc/targhooks.h | 1 + > > > gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c | 73 ++ > > > gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c | 40 +++ > > > 15 files changed, 539 insertions(+), 13 deletions(-) create mode > > > 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c > > > > > > -- > > > 2.31.1 > > >
RE: [PATCH v1] Match: Support more forms for the scalar unsigned .SAT_SUB
Thanks Richard for comments. > :c shouldn't be necessary on the plus > or on the bit_xor > OK with those changes. Will remove the :c and commit it if there is no surprise from test suites. Pan -Original Message- From: Richard Biener Sent: Friday, June 14, 2024 4:05 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] Match: Support more forms for the scalar unsigned .SAT_SUB On Wed, Jun 12, 2024 at 2:38 PM wrote: > > From: Pan Li > > After we support the scalar unsigned form 1 and 2, we would like > to introduce more forms include the branch and branchless. There > are forms 3-10 list as below: > > Form 3: > #define SAT_SUB_U_3(T) \ > T sat_sub_u_3_##T (T x, T y) \ > { \ > return x > y ? x - y : 0; \ > } > > Form 4: > #define SAT_SUB_U_4(T) \ > T sat_sub_u_4_##T (T x, T y) \ > { \ > return x >= y ? x - y : 0; \ > } > > Form 5: > #define SAT_SUB_U_5(T) \ > T sat_sub_u_5_##T (T x, T y) \ > { \ > return x < y ? 0 : x - y; \ > } > > Form 6: > #define SAT_SUB_U_6(T) \ > T sat_sub_u_6_##T (T x, T y) \ > { \ > return x <= y ? 0 : x - y; \ > } > > Form 7: > #define SAT_SUB_U_7(T) \ > T sat_sub_u_7_##T (T x, T y) \ > { \ > T ret; \ > T overflow = __builtin_sub_overflow (x, y, &ret); \ > return ret & (T)(overflow - 1); \ > } > > Form 8: > #define SAT_SUB_U_8(T) \ > T sat_sub_u_8_##T (T x, T y) \ > { \ > T ret; \ > T overflow = __builtin_sub_overflow (x, y, &ret); \ > return ret & (T)-(!overflow); \ > } > > Form 9: > #define SAT_SUB_U_9(T) \ > T sat_sub_u_9_##T (T x, T y) \ > { \ > T ret; \ > T overflow = __builtin_sub_overflow (x, y, &ret); \ > return overflow ? 0 : ret; \ > } > > Form 10: > #define SAT_SUB_U_10(T) \ > T sat_sub_u_10_##T (T x, T y) \ > { \ > T ret; \ > T overflow = __builtin_sub_overflow (x, y, &ret); \ > return !overflow ? ret : 0; \ > } > > Take form 10 as example: > > SAT_SUB_U_10(uint64_t); > > Before this patch: > uint8_t sat_sub_u_10_uint8_t (uint8_t x, uint8_t y) > { > unsigned char _1; > unsigned char _2; > uint8_t _3; > __complex__ unsigned char _6; > > ;; basic block 2, loop depth 0 > ;;pred: ENTRY > _6 = .SUB_OVERFLOW (x_4(D), y_5(D)); > _2 = IMAGPART_EXPR <_6>; > if (_2 == 0) > goto ; [50.00%] > else > goto ; [50.00%] > ;;succ: 3 > ;;4 > > ;; basic block 3, loop depth 0 > ;;pred: 2 > _1 = REALPART_EXPR <_6>; > ;;succ: 4 > > ;; basic block 4, loop depth 0 > ;;pred: 2 > ;;3 > # _3 = PHI <0(2), _1(3)> > return _3; > ;;succ: EXIT > > } > > After this patch: > uint8_t sat_sub_u_10_uint8_t (uint8_t x, uint8_t y) > { > uint8_t _3; > > ;; basic block 2, loop depth 0 > ;;pred: ENTRY > _3 = .SAT_SUB (x_4(D), y_5(D)); [tail call] > return _3; > ;;succ: EXIT > > } > > The below test suites are passed for this patch: > 1. The rv64gcv fully regression test with newlib. > 2. The rv64gcv build with glibc. > 3. The x86 bootstrap test. > 4. The x86 fully regression test. > > gcc/ChangeLog: > > * match.pd: Add more match for unsigned sat_sub. > * tree-ssa-math-opts.cc (match_unsigned_saturation_sub): Add new > func impl to match phi node for .SAT_SUB. > (math_opts_dom_walker::after_dom_children): Try match .SAT_SUB > for the phi node, MULT_EXPR, BIT_XOR_EXPR and BIT_AND_EXPR. > > Signed-off-by: Pan Li > --- > gcc/match.pd | 25 +++-- > gcc/tree-ssa-math-opts.cc | 33 + > 2 files changed, 56 insertions(+), 2 deletions(-) > > diff --git a/gcc/match.pd b/gcc/match.pd > index 5cfe81e80b3..66e411b3359 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3140,14 +3140,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > /* Unsigned saturation sub, case 1 (branch with gt): > SAT_U_SUB = X > Y ? X - Y : 0 */ > (match (unsigned_integer_sat_sub @0 @1) > - (cond (gt @0 @1) (minus @0 @1) integer_zerop) > + (cond^ (gt @0 @1) (minus @0 @1) integer_zerop) > (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) >&& types_match (type, @0, @1 > > /* Unsigned saturation sub, case 2 (branch with ge): > SAT_U_SUB = X >= Y ? X - Y : 0. */ > (match (unsigned_integer_sat_sub @0 @1) > - (cond (ge @0 @1) (minus @0 @1) integer_zerop) > + (cond^ (ge @0 @1) (minus @0 @1) integer_zerop) > (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) >&& types_match (type, @0, @1 > > @@ -3165,6 +3165,27 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) >&& types_match (type, @0, @1 > > +/* Unsigned saturation sub, case 5 (branchless bit_and with .SUB_OVERFLOW. > */ > +(match (unsigned_integer_sat_sub @0 @1) > + (bit_and:c (realpart (IFN_SUB_OVERFLO
[pushed] doc: Consolidate duplicate MOVBE listings for Intel CPUs
It might be a good idea to sort these lists alphabetically - easier to findi specific entries if one is looking for it, and that also avoids such cases? Pushed for now. Gerald gcc: * doc/invoke.texi (x86 Options): Consolidate duplicate MOVBE listings for haswell, broadwell, skylake, skylake-avx512, cannonlake, icelake-client, icelake-server, cascadelake, cooperlake, tigerlake, sapphirerapids, rocketlake, graniterapids, and graniterapids-d options to -march. --- gcc/doc/invoke.texi | 28 ++-- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 26e6a349d51..5d7a87fde86 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -34476,18 +34476,18 @@ SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, RDRND and F16C instruction set support. @item haswell -Intel Haswell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, +Intel Haswell CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, RDRND, F16C, AVX2, BMI, BMI2, LZCNT, FMA, MOVBE and HLE instruction set support. @item broadwell -Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, +Intel Broadwell CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, RDRND, F16C, AVX2, BMI, BMI2, LZCNT, FMA, MOVBE, HLE, RDSEED, ADCX and PREFETCHW instruction set support. @item skylake -Intel Skylake CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, +Intel Skylake CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, RDRND, F16C, AVX2, BMI, BMI2, LZCNT, FMA, MOVBE, HLE, RDSEED, ADCX, PREFETCHW, AES, CLFLUSHOPT, XSAVEC, XSAVES and SGX instruction set support. @@ -34548,14 +34548,14 @@ ENQCMD, UINTR, AVXIFMA, AVXVNNIINT8, AVXNECONVERT, CMPCCXADD, AVXVNNIINT16, SHA512, SM3, SM4, USER_MSR and PREFETCHI instruction set support. @item skylake-avx512 -Intel Skylake Server CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, +Intel Skylake Server CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, RDRND, F16C, AVX2, BMI, BMI2, LZCNT, FMA, MOVBE, HLE, RDSEED, ADCX, PREFETCHW, AES, CLFLUSHOPT, XSAVEC, XSAVES, SGX, AVX512F, CLWB, AVX512VL, AVX512BW, AVX512DQ and AVX512CD instruction set support. @item cannonlake -Intel Cannonlake Server CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, +Intel Cannonlake Server CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, RDRND, F16C, AVX2, BMI, BMI2, LZCNT, FMA, MOVBE, HLE, RDSEED, ADCX, PREFETCHW, AES, CLFLUSHOPT, XSAVEC, XSAVES, SGX, AVX512F, AVX512VL, AVX512BW, @@ -34563,7 +34563,7 @@ AVX512DQ, AVX512CD, PKU, AVX512VBMI, AVX512IFMA and SHA instruction set support. @item icelake-client -Intel Icelake Client CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, +Intel Icelake Client CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, RDRND, F16C, AVX2, BMI, BMI2, LZCNT, FMA, MOVBE, HLE, RDSEED, ADCX, PREFETCHW, AES, CLFLUSHOPT, XSAVEC, XSAVES, SGX, AVX512F, AVX512VL, AVX512BW, AVX512DQ, @@ -34571,7 +34571,7 @@ AVX512CD, PKU, AVX512VBMI, AVX512IFMA, SHA, AVX512VNNI, GFNI, VAES, AVX512VBMI2 , VPCLMULQDQ, AVX512BITALG, RDPID and AVX512VPOPCNTDQ instruction set support. @item icelake-server -Intel Icelake Server CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, +Intel Icelake Server CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, RDRND, F16C, AVX2, BMI, BMI2, LZCNT, FMA, MOVBE, HLE, RDSEED, ADCX, PREFETCHW, AES, CLFLUSHOPT, XSAVEC, XSAVES, SGX, AVX512F, AVX512VL, AVX512BW, AVX512DQ, @@ -34580,21 +34580,21 @@ AVX512CD, PKU, AVX512VBMI, AVX512IFMA, SHA, AVX512VNNI, GFNI, VAES, AVX512VBMI2 instruction set support. @item cascadelake -Intel Cascadelake CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, +Intel Cascadelake CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, RDRND, F16C, AVX2, BMI, BMI2, LZCNT, FMA, MOVBE, HLE, RDSEED, ADCX, PREFETCHW, AES, CLFLUSHOPT, XSAVEC, XSAVES, SGX, AVX512F, CLWB, AVX512VL, AVX512BW, AVX512DQ, AVX512CD and AVX512VNNI instruction set support. @item cooperlake -Intel cooperlake CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, +Intel cooperlake CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, RDRND, F16C, AVX2, BMI, BMI2, LZCN
Re: PING^1 [PATCH 44/52] s390: New hook implementation s390_c_mode_for_floating_type
On 6/13/24 09:43, Kewen.Lin wrote: Hi, Gentle ping: https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653382.html BR, Kewen on 2024/6/3 11:01, Kewen Lin wrote: This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE defines in s390 port, and add new port specific hook implementation s390_c_mode_for_floating_type. gcc/ChangeLog: * config/s390/s390.cc (s390_c_mode_for_floating_type): New function. (TARGET_C_MODE_FOR_FLOATING_TYPE): New macro. * config/s390/s390.h (FLOAT_TYPE_SIZE): Remove. (DOUBLE_TYPE_SIZE): Likewise. (LONG_DOUBLE_TYPE_SIZE): Likewise. Ok. Thanks! Andreas --- gcc/config/s390/s390.cc | 15 +++ gcc/config/s390/s390.h | 3 --- 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc index fa517bd3e77..117da36b3c0 100644 --- a/gcc/config/s390/s390.cc +++ b/gcc/config/s390/s390.cc @@ -18066,6 +18066,18 @@ s390_noce_conversion_profitable_p (rtx_insn *seq, struct noce_if_info *if_info) return default_noce_conversion_profitable_p (seq, if_info); } +/* Implement TARGET_C_MODE_FOR_FLOATING_TYPE. Return TFmode or DFmode + for TI_LONG_DOUBLE_TYPE which is for long double type, go with the + default one for the others. */ + +static machine_mode +s390_c_mode_for_floating_type (enum tree_index ti) +{ + if (ti == TI_LONG_DOUBLE_TYPE) +return TARGET_LONG_DOUBLE_128 ? TFmode : DFmode; + return default_mode_for_floating_type (ti); +} + /* Initialize GCC target structure. */ #undef TARGET_ASM_ALIGNED_HI_OP @@ -18382,6 +18394,9 @@ s390_noce_conversion_profitable_p (rtx_insn *seq, struct noce_if_info *if_info) #undef TARGET_NOCE_CONVERSION_PROFITABLE_P #define TARGET_NOCE_CONVERSION_PROFITABLE_P s390_noce_conversion_profitable_p +#undef TARGET_C_MODE_FOR_FLOATING_TYPE +#define TARGET_C_MODE_FOR_FLOATING_TYPE s390_c_mode_for_floating_type + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-s390.h" diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h index 0ea8802..4a4dde1a9ba 100644 --- a/gcc/config/s390/s390.h +++ b/gcc/config/s390/s390.h @@ -396,9 +396,6 @@ extern const char *s390_host_detect_local_cpu (int argc, const char **argv); #define INT_TYPE_SIZE 32 #define LONG_TYPE_SIZE (TARGET_64BIT ? 64 : 32) #define LONG_LONG_TYPE_SIZE 64 -#define FLOAT_TYPE_SIZE 32 -#define DOUBLE_TYPE_SIZE 64 -#define LONG_DOUBLE_TYPE_SIZE (TARGET_LONG_DOUBLE_128 ? 128 : 64) /* Work around target_flags dependency in ada/targtyps.cc. */ #define WIDEST_HARDWARE_FP_SIZE 64
Re: [PATCH] s390: testsuite: Fix nobp-table-jump-*.c
On Mon, Jun 03, 2024 at 03:43:39PM +0200, Stefan Schulze Frielinghaus wrote: Starting with r14-5628-g53ba8d669550d3 interprocedural VRP became strong enough in order to render these tests useless. Fixed by disabling IPA. gcc/testsuite/ChangeLog: * gcc.target/s390/nobp-table-jump-inline-z10.c: Do not perform IPA. * gcc.target/s390/nobp-table-jump-inline-z900.c: Dito. * gcc.target/s390/nobp-table-jump-z10.c: Dito. * gcc.target/s390/nobp-table-jump-z900.c: Dito. --- Ok for mainline? Ok. Thanks! Andreas .../s390/nobp-table-jump-inline-z10.c | 42 +-- .../s390/nobp-table-jump-inline-z900.c| 42 +-- .../gcc.target/s390/nobp-table-jump-z10.c | 42 +-- .../gcc.target/s390/nobp-table-jump-z900.c| 42 +-- 4 files changed, 84 insertions(+), 84 deletions(-) diff --git a/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z10.c b/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z10.c index 8dfd7e4c786..121751166d0 100644 --- a/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z10.c +++ b/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z10.c @@ -4,29 +4,29 @@ /* case-values-threshold will be set to 20 by the back-end when jump thunk are requested. */ -int __attribute__((noinline,noclone)) foo1 (void) { return 1; } -int __attribute__((noinline,noclone)) foo2 (void) { return 2; } -int __attribute__((noinline,noclone)) foo3 (void) { return 3; } -int __attribute__((noinline,noclone)) foo4 (void) { return 4; } -int __attribute__((noinline,noclone)) foo5 (void) { return 5; } -int __attribute__((noinline,noclone)) foo6 (void) { return 6; } -int __attribute__((noinline,noclone)) foo7 (void) { return 7; } -int __attribute__((noinline,noclone)) foo8 (void) { return 8; } -int __attribute__((noinline,noclone)) foo9 (void) { return 9; } -int __attribute__((noinline,noclone)) foo10 (void) { return 10; } -int __attribute__((noinline,noclone)) foo11 (void) { return 11; } -int __attribute__((noinline,noclone)) foo12 (void) { return 12; } -int __attribute__((noinline,noclone)) foo13 (void) { return 13; } -int __attribute__((noinline,noclone)) foo14 (void) { return 14; } -int __attribute__((noinline,noclone)) foo15 (void) { return 15; } -int __attribute__((noinline,noclone)) foo16 (void) { return 16; } -int __attribute__((noinline,noclone)) foo17 (void) { return 17; } -int __attribute__((noinline,noclone)) foo18 (void) { return 18; } -int __attribute__((noinline,noclone)) foo19 (void) { return 19; } -int __attribute__((noinline,noclone)) foo20 (void) { return 20; } +int __attribute__((noipa)) foo1 (void) { return 1; } +int __attribute__((noipa)) foo2 (void) { return 2; } +int __attribute__((noipa)) foo3 (void) { return 3; } +int __attribute__((noipa)) foo4 (void) { return 4; } +int __attribute__((noipa)) foo5 (void) { return 5; } +int __attribute__((noipa)) foo6 (void) { return 6; } +int __attribute__((noipa)) foo7 (void) { return 7; } +int __attribute__((noipa)) foo8 (void) { return 8; } +int __attribute__((noipa)) foo9 (void) { return 9; } +int __attribute__((noipa)) foo10 (void) { return 10; } +int __attribute__((noipa)) foo11 (void) { return 11; } +int __attribute__((noipa)) foo12 (void) { return 12; } +int __attribute__((noipa)) foo13 (void) { return 13; } +int __attribute__((noipa)) foo14 (void) { return 14; } +int __attribute__((noipa)) foo15 (void) { return 15; } +int __attribute__((noipa)) foo16 (void) { return 16; } +int __attribute__((noipa)) foo17 (void) { return 17; } +int __attribute__((noipa)) foo18 (void) { return 18; } +int __attribute__((noipa)) foo19 (void) { return 19; } +int __attribute__((noipa)) foo20 (void) { return 20; } -int __attribute__((noinline,noclone)) +int __attribute__((noipa)) bar (int a) { int ret = 0; diff --git a/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z900.c b/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z900.c index 46d2c54bcff..5ad0c72afc3 100644 --- a/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z900.c +++ b/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z900.c @@ -4,29 +4,29 @@ /* case-values-threshold will be set to 20 by the back-end when jump thunk are requested. */ -int __attribute__((noinline,noclone)) foo1 (void) { return 1; } -int __attribute__((noinline,noclone)) foo2 (void) { return 2; } -int __attribute__((noinline,noclone)) foo3 (void) { return 3; } -int __attribute__((noinline,noclone)) foo4 (void) { return 4; } -int __attribute__((noinline,noclone)) foo5 (void) { return 5; } -int __attribute__((noinline,noclone)) foo6 (void) { return 6; } -int __attribute__((noinline,noclone)) foo7 (void) { return 7; } -int __attribute__((noinline,noclone)) foo8 (void) { return 8; } -int __attribute__((noinline,noclone)) foo9 (void) { return 9; } -int __attribute__((noinline,noclone)) foo10 (void) { return 10; } -int __attribute__((noinline,noclone)) foo11 (void) {
[PATCH] gccrs: configure.ac: Fix quoting around ldl/lpthread checks
ChangeLog: * configure.ac: Add quoting around both variable checks performed, which will fix the noise reported when libc contains both ldl and lpthread. A typo around `pthread_create` being typed `pthread_crate` is also fixed. * configure: Regenerate. --- configure| 10 +- configure.ac | 10 +- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/configure b/configure index a9ea5258f0f..46f00b476c5 100755 --- a/configure +++ b/configure @@ -8827,7 +8827,7 @@ fi -# Rust requires -ldl and -lpthread if you are using an old glibc that does not include them by +# Rust requires -ldl and -lpthread if you are using a libc which does not include them by # default, so we check for them here missing_rust_dynlibs=none @@ -8945,15 +8945,15 @@ if test "$ac_res" != no; then : fi -if test $ac_cv_search_dlopen = -ldl; then +if test "$ac_cv_search_dlopen" = -ldl; then CRAB1_LIBS="$CRAB1_LIBS -ldl" -elif test $ac_cv_search_dlopen = no; then +elif test "$ac_cv_search_dlopen" = no; then missing_rust_dynlibs="libdl" fi -if test $ac_cv_search_pthread_create = -lpthread; then +if test "$ac_cv_search_pthread_create" = -lpthread; then CRAB1_LIBS="$CRAB1_LIBS -lpthread" -elif test $ac_cv_search_pthread_crate = no; then +elif test "$ac_cv_search_pthread_create" = no; then missing_rust_dynlibs="$missing_rust_dynlibs, libpthread" fi diff --git a/configure.ac b/configure.ac index adb738ac346..0befcc01cd6 100644 --- a/configure.ac +++ b/configure.ac @@ -2036,7 +2036,7 @@ fi AC_SUBST(PICFLAG) -# Rust requires -ldl and -lpthread if you are using an old glibc that does not include them by +# Rust requires -ldl and -lpthread if you are using a libc which does not include them by # default, so we check for them here missing_rust_dynlibs=none @@ -2044,15 +2044,15 @@ missing_rust_dynlibs=none AC_SEARCH_LIBS([dlopen], [dl]) AC_SEARCH_LIBS([pthread_create], [pthread]) -if test $ac_cv_search_dlopen = -ldl; then +if test "$ac_cv_search_dlopen" = -ldl; then CRAB1_LIBS="$CRAB1_LIBS -ldl" -elif test $ac_cv_search_dlopen = no; then +elif test "$ac_cv_search_dlopen" = no; then missing_rust_dynlibs="libdl" fi -if test $ac_cv_search_pthread_create = -lpthread; then +if test "$ac_cv_search_pthread_create" = -lpthread; then CRAB1_LIBS="$CRAB1_LIBS -lpthread" -elif test $ac_cv_search_pthread_crate = no; then +elif test "$ac_cv_search_pthread_create" = no; then missing_rust_dynlibs="$missing_rust_dynlibs, libpthread" fi -- 2.42.0
Re: [PATCH-1v3] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]
HAO CHEN GUI writes: >> The: >> >> bool ok = recog (attempt, use_change); >> >> should leave INSN_CODE set to the result of the successful recog. >> Why isn't that true in the example you hit? >> >> I wondered whether we might be trying to cost a NOOP_MOVE_INSN_CODE, >> since I couldn't see anything in the current code to stop that. >> But if so, that's a bug. NOOP_MOVE_INSN_CODE should have zero cost, >> and shouldn't go through insn_cost. > > Yes, the recog sets the recog data for the new rtl. But > changes_are_worthwhile calls change->old_cost and finally calls > calculate_cost to get the insn cost for old rtl. The undo/redo_changes > don't restore/recover the recog_data, which causes insn_cost takes the > wrong number of operands (recog_data.n_operands) and wrong operands > (recog_data.operand[]). Note, rs6000_insn_cost checks if the recog data > is cached. If so, it doesn't recog the insn again. > > void > insn_info::calculate_cost () const > { > basic_block cfg_bb = BLOCK_FOR_INSN (m_rtl); > temporarily_undo_changes (0); > m_cost_or_uid = insn_cost (m_rtl, optimize_bb_for_speed_p (cfg_bb)); > redo_changes (0); > } > > // A simple debug output for calculate_cost > before undo > INSN CODE 850 > n_operands 3 > > after undo > INSN CODE 685 > n_operands 3 > > Actually insn pattern 850 has 3 operands and insn pattern 685 has 2 > operands. > > IMHO, we should invalidate recog data after each undo/redo or before > calling the insn cost. Ah, ok. If the problem is stale recog_data information, I think we should instead make recog.cc:swap_change conditionally invalidate it. I.e. change: if (changes[num].object && !MEM_P (changes[num].object)) std::swap (INSN_CODE (changes[num].object), changes[num].old_code); to something like: if (changes[num].object && !MEM_P (changes[num].object)) { std::swap (INSN_CODE (changes[num].object), changes[num].old_code); if (recog_data.insn == changes[num].object) recot_data.insn = nullptr; } (untested). IMO this is better then resetting the INSN_CODE, since the INSN_CODE should be correct and is relatively expensive to recompute. Thanks, Richard
Re: [PATCH] build: Fix missing variable quotes
Hi Collin, Sorry about the mess. As Sam pointed out, there was already a patch proposed in the bugzilla PR which I've posted on the ML as well for review. The only difference with your patch is that it also changes ac_cv_search_pthread_crate to ac_cv_search_pthread_create, as otherwise the check will always be false. Best, Arthur On 6/14/24 02:53, Collin Funk wrote: When dlopen and pthread_create are in libc the variable is set to "none required", therefore running configure will show the following errors: ./configure: line 8997: test: too many arguments ./configure: line 8999: test: too many arguments ./configure: line 9003: test: too many arguments ./configure: line 9005: test: =: unary operator expected ChangeLog: * configure.ac: Quote variable result of AC_SEARCH_LIBS. * configure: Regenerate. Signed-off-by: Collin Funk --- configure| 10 +- configure.ac | 8 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/configure b/configure index 51576a41f30..6e95b27d9df 100755 --- a/configure +++ b/configure @@ -8994,15 +8994,15 @@ if test "$ac_res" != no; then : fi -if test $ac_cv_search_dlopen = -ldl; then +if test "$ac_cv_search_dlopen" = -ldl; then CRAB1_LIBS="$CRAB1_LIBS -ldl" -elif test $ac_cv_search_dlopen = no; then +elif test "$ac_cv_search_dlopen" = no; then missing_rust_dynlibs="libdl" fi -if test $ac_cv_search_pthread_create = -lpthread; then +if test "$ac_cv_search_pthread_create" = -lpthread; then CRAB1_LIBS="$CRAB1_LIBS -lpthread" -elif test $ac_cv_search_pthread_crate = no; then +elif test "$ac_cv_search_pthread_crate" = no; then missing_rust_dynlibs="$missing_rust_dynlibs, libpthread" fi @@ -19746,7 +19746,7 @@ config.status configured by $0, generated by GNU Autoconf 2.69, with options \\"\$ac_cs_config\\" -Copyright (C) 2012 Free Software Foundation, Inc. +Copyright (C) Free Software Foundation, Inc. This config.status script is free software; the Free Software Foundation gives unlimited permission to copy, distribute and modify it." diff --git a/configure.ac b/configure.ac index 5eda8dcdbf7..88576b31bfc 100644 --- a/configure.ac +++ b/configure.ac @@ -2045,15 +2045,15 @@ missing_rust_dynlibs=none AC_SEARCH_LIBS([dlopen], [dl]) AC_SEARCH_LIBS([pthread_create], [pthread]) -if test $ac_cv_search_dlopen = -ldl; then +if test "$ac_cv_search_dlopen" = -ldl; then CRAB1_LIBS="$CRAB1_LIBS -ldl" -elif test $ac_cv_search_dlopen = no; then +elif test "$ac_cv_search_dlopen" = no; then missing_rust_dynlibs="libdl" fi -if test $ac_cv_search_pthread_create = -lpthread; then +if test "$ac_cv_search_pthread_create" = -lpthread; then CRAB1_LIBS="$CRAB1_LIBS -lpthread" -elif test $ac_cv_search_pthread_crate = no; then +elif test "$ac_cv_search_pthread_crate" = no; then missing_rust_dynlibs="$missing_rust_dynlibs, libpthread" fi
Re: [PATCH] build: Fix missing variable quotes
Sam James 于2024年6月14日周五 09:02写道: > > Collin Funk writes: > > > When dlopen and pthread_create are in libc the variable is > > set to "none required", therefore running configure will show > > the following errors: > > > > ./configure: line 8997: test: too many arguments > > ./configure: line 8999: test: too many arguments > > ./configure: line 9003: test: too many arguments > > ./configure: line 9005: test: =: unary operator expected > > > > ChangeLog: > > > > * configure.ac: Quote variable result of AC_SEARCH_LIBS. > > * configure: Regenerate. > > This is PR115453 (which also needs to address a 'crate' typo). > I noticed another similar problem. I guess that we can put them in a single patch: --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -5317,7 +5327,7 @@ x: AC_MSG_CHECKING(assembler and linker for explicit JALR relocation) gcc_cv_as_ld_jalr_reloc=no -if test $gcc_cv_as_mips_explicit_relocs = yes; then +if test x$gcc_cv_as_mips_explicit_relocs = xyes; then if test $in_tree_ld = yes ; then if test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" -ge 20 -o "$gcc_cv_gld_major_version" -gt 2 \ && test $in_tree_ld_is_elf = yes; then
[PATCH][v2] Support single def-use cycle optimization for SLP reduction vectorization
We can at least mimic single def-use cycle optimization when doing single-lane SLP reductions and that's required to avoid regressing compared to non-SLP. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. * tree-vect-loop.cc (vectorizable_reduction): Allow single-def-use cycles with SLP. (vect_transform_reduction): Handle SLP single def-use cycles. (vect_transform_cycle_phi): Likewise. * gcc.dg/vect/slp-reduc-12.c: New testcase. --- gcc/testsuite/gcc.dg/vect/slp-reduc-12.c | 18 ++ gcc/tree-vect-loop.cc| 45 ++-- 2 files changed, 45 insertions(+), 18 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/slp-reduc-12.c diff --git a/gcc/testsuite/gcc.dg/vect/slp-reduc-12.c b/gcc/testsuite/gcc.dg/vect/slp-reduc-12.c new file mode 100644 index 000..625f8097c54 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/slp-reduc-12.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_double } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_hw_misalign } */ +/* { dg-additional-options "-Ofast" } */ + +double foo (double *x, int * __restrict a, int n) +{ + double r = 0.; + for (int i = 0; i < n; ++i) +{ + a[i] = a[i] + i; + r += x[i]; +} + return r; +} + +/* { dg-final { scan-tree-dump "using single def-use cycle for reduction" "vect" } } */ diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index bbd5d261907..d9a2ad69484 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -8320,7 +8320,11 @@ vectorizable_reduction (loop_vec_info loop_vinfo, participating. When unrolling we want each unrolled iteration to have its own reduction accumulator since one of the main goals of unrolling a reduction is to reduce the aggregate loop-carried latency. */ - if (ncopies > 1 + if ((ncopies > 1 + || (slp_node + && !REDUC_GROUP_FIRST_ELEMENT (stmt_info) + && SLP_TREE_LANES (slp_node) == 1 + && vect_get_num_copies (loop_vinfo, vectype_in) > 1)) && (STMT_VINFO_RELEVANT (stmt_info) <= vect_used_only_live) && reduc_chain_length == 1 && loop_vinfo->suggested_unroll_factor == 1) @@ -8373,6 +8377,10 @@ vectorizable_reduction (loop_vec_info loop_vinfo, single_defuse_cycle = false; } } + if (dump_enabled_p () && single_defuse_cycle) +dump_printf_loc (MSG_NOTE, vect_location, +"using single def-use cycle for reduction by reducing " +"multiple vectors to one in the loop body\n"); STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info) = single_defuse_cycle; /* If the reduction stmt is one of the patterns that have lane @@ -8528,9 +8536,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo, { tree vectype_out = STMT_VINFO_VECTYPE (stmt_info); class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); - int i; - int ncopies; - int vec_num; + unsigned ncopies; + unsigned vec_num; stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info); gcc_assert (reduc_info->is_reduc_info); @@ -8577,7 +8584,6 @@ vect_transform_reduction (loop_vec_info loop_vinfo, auto_vec vec_oprnds0; auto_vec vec_oprnds1; auto_vec vec_oprnds2; - tree def0; if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "transform reduction.\n"); @@ -8652,20 +8658,21 @@ vect_transform_reduction (loop_vec_info loop_vinfo, definition. */ if (single_defuse_cycle) { - gcc_assert (!slp_node); - vect_get_vec_defs_for_operand (loop_vinfo, stmt_info, 1, -op.ops[reduc_index], -reduc_index == 0 ? &vec_oprnds0 -: (reduc_index == 1 ? &vec_oprnds1 - : &vec_oprnds2)); + vect_get_vec_defs (loop_vinfo, stmt_info, slp_node, 1, +reduc_index == 0 ? op.ops[0] : NULL_TREE, &vec_oprnds0, +reduc_index == 1 ? op.ops[1] : NULL_TREE, &vec_oprnds1, +reduc_index == 2 ? op.ops[2] : NULL_TREE, +&vec_oprnds2); } bool emulated_mixed_dot_prod = vect_is_emulated_mixed_dot_prod (stmt_info); - FOR_EACH_VEC_ELT (vec_oprnds0, i, def0) + unsigned num = (reduc_index == 0 + ? vec_oprnds1.length () : vec_oprnds0.length ()); + for (unsigned i = 0; i < num; ++i) { gimple *new_stmt; - tree vop[3] = { def0, vec_oprnds1[i], NULL_TREE }; + tree vop[3] = { vec_oprnds0[i], vec_oprnds1[i], NULL_TREE }; if (masked_loop_p && !mask_by_cond_expr) { /* No conditional ifns have been defined for dot-product yet. */ @@ -8720,10 +8727,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo, vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt, gsi); } - if (slp_n
[PATCH] Adjust gcc.target/i386/vect-strided-3.c
The following disables SSE4 instead of just AVX to avoid pextrq being used, confusing the assembler scanning. This avoids the reported failure with -march=cascadelake but adds a FAIL for -march=cascadelake -m32 (I've opened PR115487 for that). Tested on x86_64-unknown-linux-gnu, pushed. * gcc.target/i386/vect-strided-3.c: Disable SSE4 instead of AVX. --- gcc/testsuite/gcc.target/i386/vect-strided-3.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/i386/vect-strided-3.c b/gcc/testsuite/gcc.target/i386/vect-strided-3.c index b462701a0b2..f9c54a6f715 100644 --- a/gcc/testsuite/gcc.target/i386/vect-strided-3.c +++ b/gcc/testsuite/gcc.target/i386/vect-strided-3.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -msse2 -mno-avx -fno-tree-slp-vectorize" } */ +/* { dg-options "-O2 -msse2 -mno-sse4 -fno-tree-slp-vectorize" } */ void foo (int * __restrict a, int *b, int s) { -- 2.35.3
Re: [analyzer PATCH] Restore bootstrap with g++ 4.8.
On 07/06/24 19:40 +0100, Roger Sayle wrote: This patch restores bootstrap when using g++ 4.8 as a host compiler. Returning a std::unique_ptr requires a std::move on C++ compilers (pre-C++17) that don't guarantee copy elision/return value optimization. It doesn't though. The C++17 guaranteed copy elision rules are not relevant here. This is about lookup for the constructor used in the return statement, and whether that lookup considers the variable to be an lvalue or an rvalue. C++11 already says this is valid: i#include std::unique_ptr f() { std::unique_ptr m; return m; } See C++11 12.8 [class.copy] p31: This elision of copy/move operations, called copy elision, is permitted in the following circumstances (which may be combined to eliminate multiple copies): - in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object (other than a function or catch-clause parameter) with the same cv- unqualified type as the function return type, the copy/move operation can be omitted by constructing the automatic object directly into the function’s return value and then p32: When the criteria for elision of a copy operation are met or would be met save for the fact that the source object is a function parameter, and the object to be copied is designated by an lvalue, overload resolution to select the constructor for the copy is first performed as if the object were designated by an rvalue. The constructor isn't required to be elided in C++11, but the compiler is required to use a move constructor instead of a copy constructor, if a move constructor is available. So you don't need to use std::move on the return value. And the code above compiles fine with gcc 4.8.5 on CentOS 7 (and even with 4.7.1). So the std::move calls you've added are redundant, and will cause -Wredundant-move warnings. What's the error you were seeing? Bootstrapped on x86_64-pc-linux-gnu using both gcc 4.8.5 (system) and gcc 10.2.1 (using "scl enable devetoolset-10") as host compilers. Ok for mainline? 2024-06-07 Roger Sayle gcc/analyzer/ChangeLog * constraint-manager.cc (equiv_class::make_dump_widget): Use std::move to return a std::unique_ptr. (bounded_ranges_constraint::make_dump_widget): Likewise. (constraint_manager::make_dump_widget): Likewise. * program_state.cc (sm_state_map::make_dump_widget): Likewise. (program_state::make_dump_widget): Likewise. * region-model.cc (region_to_value_map::make_dump_widget): Likewise. (region_model::make_dump_widget): Likewise. * region.cc (region::make_dump_widget): Likewise. * store.cc (binding_cluster::make_dump_widget): Likewise. (store::make_dump_widget): Likewise. * svalue.cc (svalue::make_dump_widget): Likewise. Thanks in advance, Roger -- diff --git a/gcc/analyzer/constraint-manager.cc b/gcc/analyzer/constraint-manager.cc index 707385d..883f33b 100644 --- a/gcc/analyzer/constraint-manager.cc +++ b/gcc/analyzer/constraint-manager.cc @@ -1176,7 +1176,7 @@ equiv_class::make_dump_widget (const text_art::dump_widget_info &dwi, ec_widget->add_child (tree_widget::make (dwi, &pp)); } - return ec_widget; + return std::move (ec_widget); } /* Generate a hash value for this equiv_class. @@ -1500,7 +1500,7 @@ make_dump_widget (const text_art::dump_widget_info &dwi) const (tree_widget::from_fmt (dwi, nullptr, "ec%i bounded ranges", m_ec_id.as_int ())); m_ranges->add_to_dump_widget (*brc_widget.get (), dwi); - return brc_widget; + return std::move (brc_widget); } bool @@ -1853,7 +1853,7 @@ constraint_manager::make_dump_widget (const text_art::dump_widget_info &dwi) con if (cm_widget->get_num_children () == 0) return nullptr; - return cm_widget; + return std::move (cm_widget); } /* Attempt to add the constraint LHS OP RHS to this constraint_manager. diff --git a/gcc/analyzer/program-state.cc b/gcc/analyzer/program-state.cc index dc2d4bd..efaf569 100644 --- a/gcc/analyzer/program-state.cc +++ b/gcc/analyzer/program-state.cc @@ -382,7 +382,7 @@ sm_state_map::make_dump_widget (const text_art::dump_widget_info &dwi, state_widget->add_child (tree_widget::make (dwi, pp)); } - return state_widget; + return std::move (state_widget); } /* Return true if no states have been set within this map @@ -1247,7 +1247,7 @@ program_state::make_dump_widget (const text_art::dump_widget_info &dwi) const state_widget->add_child (smap->make_dump_widget (dwi, m_region_model)); } - return state_widget; + return std::move (state_widget); } /* Update this program_state to reflect a top-level call to FUN. diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc index d142d85..4fbc970 100644 --- a/gcc/analyzer/region-model.cc +++ b/gcc/analyzer/region-model.cc @@ -288,7 +288,7 @@ make_dump_widget (const text_art::dump_wi
Re: [analyzer PATCH] Restore bootstrap with g++ 4.8.
On 14/06/24 10:36 +0100, Jonathan Wakely wrote: On 07/06/24 19:40 +0100, Roger Sayle wrote: This patch restores bootstrap when using g++ 4.8 as a host compiler. Returning a std::unique_ptr requires a std::move on C++ compilers (pre-C++17) that don't guarantee copy elision/return value optimization. It doesn't though. The C++17 guaranteed copy elision rules are not relevant here. This is about lookup for the constructor used in the return statement, and whether that lookup considers the variable to be an lvalue or an rvalue. C++11 already says this is valid: i#include std::unique_ptr f() { std::unique_ptr m; return m; } See C++11 12.8 [class.copy] p31: This elision of copy/move operations, called copy elision, is permitted in the following circumstances (which may be combined to eliminate multiple copies): - in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object (other than a function or catch-clause parameter) with the same cv- unqualified type as the function return type, the copy/move operation can be omitted by constructing the automatic object directly into the function’s return value and then p32: When the criteria for elision of a copy operation are met or would be met save for the fact that the source object is a function parameter, and the object to be copied is designated by an lvalue, overload resolution to select the constructor for the copy is first performed as if the object were designated by an rvalue. The constructor isn't required to be elided in C++11, but the compiler is required to use a move constructor instead of a copy constructor, if a move constructor is available. So you don't need to use std::move on the return value. And the code above compiles fine with gcc 4.8.5 on CentOS 7 (and even with 4.7.1). So the std::move calls you've added are redundant, and will cause -Wredundant-move warnings. What's the error you were seeing? I can reproduce it: /home/test/src/gcc/gcc/analyzer/constraint-manager.cc: In member function ‘std::unique_ptr ana::equiv_class::make_dump_widget(const text_art::dump_widget_info&, unsigned int) const’: /home/test/src/gcc/gcc/analyzer/constraint-manager.cc:1179:10: error: cannot bind ‘std::unique_ptr’ lvalue to ‘std::unique_ptr&&’ return ec_widget; ^ Odd ... I'm looking into it ...
[PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab
The following retires vcond{,u,eq} optabs by stopping to use them from the middle-end. Targets instead (should) implement vcond_mask and vec_cmp{,u,eq} optabs. The PR this change refers to lists possibly affected targets - those implementing these patterns, and in particular it lists mips, sparc and ia64 as targets that most definitely will regress while others might simply remove their vcond{,u,eq} patterns. I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64. I know riscv doesn't implement any of the legacy optabs. But less maintained vector targets might need adjustments. I want to get rid of those optabs for GCC 15. If I don't hear from you I will assume your target is fine. Thanks, Richard. PR middle-end/114189 * optabs-query.h (get_vcond_icode): Always return CODE_FOR_nothing. (get_vcond_eq_icode): Likewise. --- gcc/optabs-query.h | 13 - 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h index 0cb2c21ba85..31fbce80175 100644 --- a/gcc/optabs-query.h +++ b/gcc/optabs-query.h @@ -112,14 +112,9 @@ get_vec_cmp_eq_icode (machine_mode vmode, machine_mode mask_mode) mode CMODE, unsigned if UNS is true, resulting in a value of mode VMODE. */ inline enum insn_code -get_vcond_icode (machine_mode vmode, machine_mode cmode, bool uns) +get_vcond_icode (machine_mode, machine_mode, bool) { - enum insn_code icode = CODE_FOR_nothing; - if (uns) -icode = convert_optab_handler (vcondu_optab, vmode, cmode); - else -icode = convert_optab_handler (vcond_optab, vmode, cmode); - return icode; + return CODE_FOR_nothing; } /* Return insn code for a conditional operator with a mask mode @@ -135,9 +130,9 @@ get_vcond_mask_icode (machine_mode vmode, machine_mode mmode) mode CMODE (only EQ/NE), resulting in a value of mode VMODE. */ inline enum insn_code -get_vcond_eq_icode (machine_mode vmode, machine_mode cmode) +get_vcond_eq_icode (machine_mode, machine_mode) { - return convert_optab_handler (vcondeq_optab, vmode, cmode); + return CODE_FOR_nothing; } /* Enumerates the possible extraction_insn operations. */ -- 2.35.3
Re: [analyzer PATCH] Restore bootstrap with g++ 4.8.
On 14/06/24 11:26 +0100, Jonathan Wakely wrote: On 14/06/24 10:36 +0100, Jonathan Wakely wrote: On 07/06/24 19:40 +0100, Roger Sayle wrote: This patch restores bootstrap when using g++ 4.8 as a host compiler. Returning a std::unique_ptr requires a std::move on C++ compilers (pre-C++17) that don't guarantee copy elision/return value optimization. It doesn't though. The C++17 guaranteed copy elision rules are not relevant here. This is about lookup for the constructor used in the return statement, and whether that lookup considers the variable to be an lvalue or an rvalue. C++11 already says this is valid: i#include std::unique_ptr f() { std::unique_ptr m; return m; } See C++11 12.8 [class.copy] p31: This elision of copy/move operations, called copy elision, is permitted in the following circumstances (which may be combined to eliminate multiple copies): - in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object (other than a function or catch-clause parameter) with the same cv- unqualified type as the function return type, the copy/move operation can be omitted by constructing the automatic object directly into the function’s return value and then p32: When the criteria for elision of a copy operation are met or would be met save for the fact that the source object is a function parameter, and the object to be copied is designated by an lvalue, overload resolution to select the constructor for the copy is first performed as if the object were designated by an rvalue. The constructor isn't required to be elided in C++11, but the compiler is required to use a move constructor instead of a copy constructor, if a move constructor is available. So you don't need to use std::move on the return value. And the code above compiles fine with gcc 4.8.5 on CentOS 7 (and even with 4.7.1). So the std::move calls you've added are redundant, and will cause -Wredundant-move warnings. What's the error you were seeing? I can reproduce it: /home/test/src/gcc/gcc/analyzer/constraint-manager.cc: In member function ‘std::unique_ptr ana::equiv_class::make_dump_widget(const text_art::dump_widget_info&, unsigned int) const’: /home/test/src/gcc/gcc/analyzer/constraint-manager.cc:1179:10: error: cannot bind ‘std::unique_ptr’ lvalue to ‘std::unique_ptr&&’ return ec_widget; ^ Odd ... I'm looking into it ... Ah, the problem here is that the function's return type is std::unique_ptr but the return value is std::unique_ptr and that requires the converting constructor, not the move constructor. That *did* change after C++11, and isn't supported by G++ until 5.1 Is there a reason that function has to return unique_ptr instead of unique_ptr? The conversion to ptr-to-base could happen at the call site (where you always have an rvalue) instead of on the return value.
Re: [analyzer PATCH] Restore bootstrap with g++ 4.8.
On 14/06/24 11:37 +0100, Jonathan Wakely wrote: On 14/06/24 11:26 +0100, Jonathan Wakely wrote: On 14/06/24 10:36 +0100, Jonathan Wakely wrote: On 07/06/24 19:40 +0100, Roger Sayle wrote: This patch restores bootstrap when using g++ 4.8 as a host compiler. Returning a std::unique_ptr requires a std::move on C++ compilers (pre-C++17) that don't guarantee copy elision/return value optimization. It doesn't though. The C++17 guaranteed copy elision rules are not relevant here. This is about lookup for the constructor used in the return statement, and whether that lookup considers the variable to be an lvalue or an rvalue. C++11 already says this is valid: i#include std::unique_ptr f() { std::unique_ptr m; return m; } See C++11 12.8 [class.copy] p31: This elision of copy/move operations, called copy elision, is permitted in the following circumstances (which may be combined to eliminate multiple copies): - in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object (other than a function or catch-clause parameter) with the same cv- unqualified type as the function return type, the copy/move operation can be omitted by constructing the automatic object directly into the function’s return value and then p32: When the criteria for elision of a copy operation are met or would be met save for the fact that the source object is a function parameter, and the object to be copied is designated by an lvalue, overload resolution to select the constructor for the copy is first performed as if the object were designated by an rvalue. The constructor isn't required to be elided in C++11, but the compiler is required to use a move constructor instead of a copy constructor, if a move constructor is available. So you don't need to use std::move on the return value. And the code above compiles fine with gcc 4.8.5 on CentOS 7 (and even with 4.7.1). So the std::move calls you've added are redundant, and will cause -Wredundant-move warnings. What's the error you were seeing? I can reproduce it: /home/test/src/gcc/gcc/analyzer/constraint-manager.cc: In member function ‘std::unique_ptr ana::equiv_class::make_dump_widget(const text_art::dump_widget_info&, unsigned int) const’: /home/test/src/gcc/gcc/analyzer/constraint-manager.cc:1179:10: error: cannot bind ‘std::unique_ptr’ lvalue to ‘std::unique_ptr&&’ return ec_widget; ^ Odd ... I'm looking into it ... Ah, the problem here is that the function's return type is std::unique_ptr but the return value is std::unique_ptr and that requires the converting constructor, not the move constructor. That *did* change after C++11, and isn't supported by G++ until 5.1 That changed with https://cplusplus.github.io/CWG/issues/1579.html which is in C++14 and is a DR against C++11, but of course compilers that implement the original C++11 rules don't support that DR. Arguably, g++ should not give -Wredundant-move warnings here when using the -std=c++11 dialect, because it's only redundant if the C++11 compiler implements that DR. Is there a reason that function has to return unique_ptr instead of unique_ptr? The conversion to ptr-to-base could happen at the call site (where you always have an rvalue) instead of on the return value. Changing the return type to match the return value would avoid the issue entirely though, as it would be valid with all C++11 compilers.
Re: [analyzer PATCH] Restore bootstrap with g++ 4.8.
On 14/06/24 11:42 +0100, Jonathan Wakely wrote: On 14/06/24 11:37 +0100, Jonathan Wakely wrote: On 14/06/24 11:26 +0100, Jonathan Wakely wrote: On 14/06/24 10:36 +0100, Jonathan Wakely wrote: On 07/06/24 19:40 +0100, Roger Sayle wrote: This patch restores bootstrap when using g++ 4.8 as a host compiler. Returning a std::unique_ptr requires a std::move on C++ compilers (pre-C++17) that don't guarantee copy elision/return value optimization. It doesn't though. The C++17 guaranteed copy elision rules are not relevant here. This is about lookup for the constructor used in the return statement, and whether that lookup considers the variable to be an lvalue or an rvalue. C++11 already says this is valid: i#include std::unique_ptr f() { std::unique_ptr m; return m; } See C++11 12.8 [class.copy] p31: This elision of copy/move operations, called copy elision, is permitted in the following circumstances (which may be combined to eliminate multiple copies): - in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object (other than a function or catch-clause parameter) with the same cv- unqualified type as the function return type, the copy/move operation can be omitted by constructing the automatic object directly into the function’s return value and then p32: When the criteria for elision of a copy operation are met or would be met save for the fact that the source object is a function parameter, and the object to be copied is designated by an lvalue, overload resolution to select the constructor for the copy is first performed as if the object were designated by an rvalue. The constructor isn't required to be elided in C++11, but the compiler is required to use a move constructor instead of a copy constructor, if a move constructor is available. So you don't need to use std::move on the return value. And the code above compiles fine with gcc 4.8.5 on CentOS 7 (and even with 4.7.1). So the std::move calls you've added are redundant, and will cause -Wredundant-move warnings. What's the error you were seeing? I can reproduce it: /home/test/src/gcc/gcc/analyzer/constraint-manager.cc: In member function ‘std::unique_ptr ana::equiv_class::make_dump_widget(const text_art::dump_widget_info&, unsigned int) const’: /home/test/src/gcc/gcc/analyzer/constraint-manager.cc:1179:10: error: cannot bind ‘std::unique_ptr’ lvalue to ‘std::unique_ptr&&’ return ec_widget; ^ Odd ... I'm looking into it ... Ah, the problem here is that the function's return type is std::unique_ptr but the return value is std::unique_ptr and that requires the converting constructor, not the move constructor. That *did* change after C++11, and isn't supported by G++ until 5.1 That changed with https://cplusplus.github.io/CWG/issues/1579.html Which for the record was implemented in r5-1576-gfb682f9458c6cf (by me, which I'd forgotten!) which is in C++14 and is a DR against C++11, but of course compilers that implement the original C++11 rules don't support that DR. Arguably, g++ should not give -Wredundant-move warnings here when using the -std=c++11 dialect, because it's only redundant if the C++11 compiler implements that DR. Is there a reason that function has to return unique_ptr instead of unique_ptr? The conversion to ptr-to-base could happen at the call site (where you always have an rvalue) instead of on the return value. Changing the return type to match the return value would avoid the issue entirely though, as it would be valid with all C++11 compilers.
Re: [PATCH-1v3] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]
在 2024/6/14 16:36, Richard Sandiford 写道: > Ah, ok. If the problem is stale recog_data information, I think we > should instead make recog.cc:swap_change conditionally invalidate it. > I.e. change: > > if (changes[num].object && !MEM_P (changes[num].object)) > std::swap (INSN_CODE (changes[num].object), changes[num].old_code); > > to something like: > > if (changes[num].object && !MEM_P (changes[num].object)) > { > std::swap (INSN_CODE (changes[num].object), changes[num].old_code); > if (recog_data.insn == changes[num].object) > recot_data.insn = nullptr; > } > > (untested). IMO this is better then resetting the INSN_CODE, since the > INSN_CODE should be correct and is relatively expensive to recompute. It's better. I will modify the patch and test it. Thanks Gui Haochen
Re: [PATCH-1v3] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]
Hi Richard, Thanks for your comments. 在 2024/6/12 15:51, Richard Sandiford 写道: > It should only be necessary to call change_is_worthwhile once, > with strict == !prop.likely_profitable_p () > > So something like: > > bool ok = recog (attempt, use_change); > if (ok && !prop.changed_mem_p () && !use_insn->is_asm ()) > { > bool strict_p = !prop.likely_profitable_p (); > if (!change_is_worthwhile (use_change, strict_p)) > { > if (dump_file) > fprintf (dump_file, "change not profitable"); > ok = false; > } > } I will modify the code. Thanks. > The: > > bool ok = recog (attempt, use_change); > > should leave INSN_CODE set to the result of the successful recog. > Why isn't that true in the example you hit? > > I wondered whether we might be trying to cost a NOOP_MOVE_INSN_CODE, > since I couldn't see anything in the current code to stop that. > But if so, that's a bug. NOOP_MOVE_INSN_CODE should have zero cost, > and shouldn't go through insn_cost. Yes, the recog sets the recog data for the new rtl. But changes_are_worthwhile calls change->old_cost and finally calls calculate_cost to get the insn cost for old rtl. The undo/redo_changes don't restore/recover the recog_data, which causes insn_cost takes the wrong number of operands (recog_data.n_operands) and wrong operands (recog_data.operand[]). Note, rs6000_insn_cost checks if the recog data is cached. If so, it doesn't recog the insn again. void insn_info::calculate_cost () const { basic_block cfg_bb = BLOCK_FOR_INSN (m_rtl); temporarily_undo_changes (0); m_cost_or_uid = insn_cost (m_rtl, optimize_bb_for_speed_p (cfg_bb)); redo_changes (0); } // A simple debug output for calculate_cost before undo INSN CODE 850 n_operands 3 after undo INSN CODE 685 n_operands 3 Actually insn pattern 850 has 3 operands and insn pattern 685 has 2 operands. IMHO, we should invalidate recog data after each undo/redo or before calling the insn cost. Looking forward to your advice. Thanks Gui Haochen
Re: [Patch, rs6000, middle-end] v2: Add implementation for different targets for pair mem fusion
Ajit Agarwal writes: > Hello Richard: > > All comments are addressed. I don't think this addresses the following comments from the previous reviews: (1) It is not correct to mark existing insn uses as live-out. The patch mustn't try to do this. (2) To quote a previous review: It's probably better to create a fresh OO register, rather than change an existing 128-bit register to 256 bits. If we do that, and if reg:V16QI 125 is the destination of the second load (which I assume it is from the 16 offset in the subreg), then the new RTL should be: (vec_select:HI (subreg:V8HI (reg:OO NEW_REG) 16) ...) It's possible to get this by using insn_propagation to replace (reg:V16QI 125) with (subreg:V16QI (reg:OO NEW_REG) 16). insn_propagation should then take care of the rest. There are no existing rtl-ssa routines for handling new registers though. (The idea was to add things as the need arose.) The reason for (2) is that changing the mode of an existing pseudo invalidates all existing references to that pseudo. Although the patch tries to fix things up, it's doing that at a stage where there is already "garbage in" (in the sense that the starting RTL is invalid). Just changing the mode would also invalidate things like REG_EXPR, for example. In contrast, the advantage of creating a new pseudo means that every insn transformation is from structurally valid RTL to structurally valid RTL. It also prevents information being incorrectly carried over from the old pseudo. x Thanks, Richard
Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab
Richard Biener writes: > The following retires vcond{,u,eq} optabs by stopping to use them > from the middle-end. Targets instead (should) implement vcond_mask > and vec_cmp{,u,eq} optabs. The PR this change refers to lists > possibly affected targets - those implementing these patterns, > and in particular it lists mips, sparc and ia64 as targets that > most definitely will regress while others might simply remove > their vcond{,u,eq} patterns. > > I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64. > I know riscv doesn't implement any of the legacy optabs. But less > maintained vector targets might need adjustments. > > I want to get rid of those optabs for GCC 15. If I don't hear from > you I will assume your target is fine. Great! Thanks for doing this. Is there a plan for how we should handle vector comparisons that have to be done as the inverse of the negated condition? Should targets simply not provide vec_cmp for such conditions and leave the target-independent code to deal with the fallout? (For a standalone comparison, it would invert the result. For a VEC_COND_EXPR it would swap the true and false values.) Richard > > Thanks, > Richard. > > PR middle-end/114189 > * optabs-query.h (get_vcond_icode): Always return CODE_FOR_nothing. > (get_vcond_eq_icode): Likewise. > --- > gcc/optabs-query.h | 13 - > 1 file changed, 4 insertions(+), 9 deletions(-) > > diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h > index 0cb2c21ba85..31fbce80175 100644 > --- a/gcc/optabs-query.h > +++ b/gcc/optabs-query.h > @@ -112,14 +112,9 @@ get_vec_cmp_eq_icode (machine_mode vmode, machine_mode > mask_mode) > mode CMODE, unsigned if UNS is true, resulting in a value of mode VMODE. > */ > > inline enum insn_code > -get_vcond_icode (machine_mode vmode, machine_mode cmode, bool uns) > +get_vcond_icode (machine_mode, machine_mode, bool) > { > - enum insn_code icode = CODE_FOR_nothing; > - if (uns) > -icode = convert_optab_handler (vcondu_optab, vmode, cmode); > - else > -icode = convert_optab_handler (vcond_optab, vmode, cmode); > - return icode; > + return CODE_FOR_nothing; > } > > /* Return insn code for a conditional operator with a mask mode > @@ -135,9 +130,9 @@ get_vcond_mask_icode (machine_mode vmode, machine_mode > mmode) > mode CMODE (only EQ/NE), resulting in a value of mode VMODE. */ > > inline enum insn_code > -get_vcond_eq_icode (machine_mode vmode, machine_mode cmode) > +get_vcond_eq_icode (machine_mode, machine_mode) > { > - return convert_optab_handler (vcondeq_optab, vmode, cmode); > + return CODE_FOR_nothing; > } > > /* Enumerates the possible extraction_insn operations. */
Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab
On Fri, 14 Jun 2024, Richard Sandiford wrote: > Richard Biener writes: > > The following retires vcond{,u,eq} optabs by stopping to use them > > from the middle-end. Targets instead (should) implement vcond_mask > > and vec_cmp{,u,eq} optabs. The PR this change refers to lists > > possibly affected targets - those implementing these patterns, > > and in particular it lists mips, sparc and ia64 as targets that > > most definitely will regress while others might simply remove > > their vcond{,u,eq} patterns. > > > > I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64. > > I know riscv doesn't implement any of the legacy optabs. But less > > maintained vector targets might need adjustments. > > > > I want to get rid of those optabs for GCC 15. If I don't hear from > > you I will assume your target is fine. > > Great! Thanks for doing this. > > Is there a plan for how we should handle vector comparisons that > have to be done as the inverse of the negated condition? Should > targets simply not provide vec_cmp for such conditions and leave > the target-independent code to deal with the fallout? (For a > standalone comparison, it would invert the result. For a VEC_COND_EXPR > it would swap the true and false values.) I would expect that the ISEL pass which currently deals with finding valid combos of .VCMP{,U,EQ} and .VCOND_MASK deals with this. So how do we deal with this right now? I expect RTL expansion will do the inverse trick, no? Thanks, Richard. > Richard > > > > > Thanks, > > Richard. > > > > PR middle-end/114189 > > * optabs-query.h (get_vcond_icode): Always return CODE_FOR_nothing. > > (get_vcond_eq_icode): Likewise. > > --- > > gcc/optabs-query.h | 13 - > > 1 file changed, 4 insertions(+), 9 deletions(-) > > > > diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h > > index 0cb2c21ba85..31fbce80175 100644 > > --- a/gcc/optabs-query.h > > +++ b/gcc/optabs-query.h > > @@ -112,14 +112,9 @@ get_vec_cmp_eq_icode (machine_mode vmode, machine_mode > > mask_mode) > > mode CMODE, unsigned if UNS is true, resulting in a value of mode > > VMODE. */ > > > > inline enum insn_code > > -get_vcond_icode (machine_mode vmode, machine_mode cmode, bool uns) > > +get_vcond_icode (machine_mode, machine_mode, bool) > > { > > - enum insn_code icode = CODE_FOR_nothing; > > - if (uns) > > -icode = convert_optab_handler (vcondu_optab, vmode, cmode); > > - else > > -icode = convert_optab_handler (vcond_optab, vmode, cmode); > > - return icode; > > + return CODE_FOR_nothing; > > } > > > > /* Return insn code for a conditional operator with a mask mode > > @@ -135,9 +130,9 @@ get_vcond_mask_icode (machine_mode vmode, machine_mode > > mmode) > > mode CMODE (only EQ/NE), resulting in a value of mode VMODE. */ > > > > inline enum insn_code > > -get_vcond_eq_icode (machine_mode vmode, machine_mode cmode) > > +get_vcond_eq_icode (machine_mode, machine_mode) > > { > > - return convert_optab_handler (vcondeq_optab, vmode, cmode); > > + return CODE_FOR_nothing; > > } > > > > /* Enumerates the possible extraction_insn operations. */ > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Re: [PATCH 2/3] Enabled LRA for ia64.
Dear Jonathan, Jeff, On 13.06.24 12:33, Jonathan Wakely wrote: On Wed, 12 Jun 2024 at 22:00, Frank Scheiner wrote: Ok, I posted the results as created by contrib/test_summary now: 1. non-LRA version on [1] 2. LRA version on [2] [1]: https://gcc.gnu.org/pipermail/gcc-testresults/2024-June/817267.html [2]: https://gcc.gnu.org/pipermail/gcc-testresults/2024-June/817268.html Thanks! These ones are probably due to non-reserved names in glibc or kernel headers: FAIL: 17_intro/names.cc -std=gnu++17 (test for excess errors) FAIL: 17_intro/names_pstl.cc -std=gnu++17 (test for excess errors) FAIL: experimental/names.cc -std=gnu++17 (test for excess errors) The errors for all three are probably the same and should be decipherable from libstdc++.log which will show which names defined as macros in names.cc are clashing with names in system headers. Thanks for the hints. I looked into the log now and searched for the respective tests. And the log excerpts for all three failing tests (also attached with more context): ``` FAIL: 17_intro/names.cc -std=gnu++17 (test for excess errors) Excess errors: /usr/include/bits/sigcontext.h:37: error: expected unqualified-id before ';' token /usr/include/bits/sigcontext.h:37: error: expected ')' before ';' token /usr/include/sys/ucontext.h:44: error: expected unqualified-id before ';' token /usr/include/sys/ucontext.h:44: error: expected ')' before ';' token ``` ...point to two headers which are part of glibc 2.39 (w/ia64 support re-added): * /usr/include/bits/sigcontext.h:32-38: ``` 32 struct __ia64_fpreg 33 { 34 union 35 { 36 unsigned long bits[2]; 37 } u; 38 } __attribute__ ((__aligned__ (16))); ``` * /usr/include/sys/ucontext.h:39-45: ``` 39 struct __ia64_fpreg_mcontext 40 { 41 union 42 { 43 unsigned long __ctx(bits)[2]; 44 } __ctx(u); 45 } __attribute__ ((__aligned__ (16))); ``` Cheers, Frank # Log excerpts for failing tests in libstdc++v3 for 5.0.0 20240528 (experimental) [master revision 236116068151bbc72aaaf53d0f223fe06f7e3bac] (GCC) w/LRA # ## FAIL: 17_intro/names.cc -std=gnu++17 (test for excess errors) ## ``` extra_tool_flags are: -std=gnu++17 -D__GLIBCXX__= Executing on host: /dev/shm/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/gcc.build.lnx/./gcc/xg++ -shared-libgcc -B/dev/shm/gcc-15-lra/src.gcc.ia64-to olchain-3.240529.123346.921189/gcc/gcc.build.lnx/./gcc -nostdinc++ -L/dev/shm/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/gcc.build.lnx/ia64-t2-linux -gnu/libstdc++-v3/src -L/dev/shm/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/gcc.build.lnx/ia64-t2-linux-gnu/libstdc++-v3/src/.libs -L/dev/shm/gcc-15 -lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/gcc.build.lnx/ia64-t2-linux-gnu/libstdc++-v3/libsupc++/.libs -B/usr/ia64-t2-linux-gnu/bin/ -B/usr/ia64-t2-linux -gnu/lib/ -isystem /usr/ia64-t2-linux-gnu/include -isystem /usr/ia64-t2-linux-gnu/sys-include -fchecking=1 -B/usr/src/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.12334 6.921189/gcc/gcc.build.lnx/ia64-t2-linux-gnu/./libstdc++-v3/src/.libs -fmessage-length=0 -fno-show-column -ffunction-sections -fdata-sections -g -O2 -fPIC -D_GNU_SOUR CE -DLOCALEDIR="." -nostdinc++ -I/dev/shm/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/gcc.build.lnx/ia64-t2-linux-gnu/libstdc++-v3/include/ia64-t2-li nux-gnu -I/dev/shm/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/gcc.build.lnx/ia64-t2-linux-gnu/libstdc++-v3/include -I/dev/shm/gcc-15-lra/src.gcc.ia6 4-toolchain-3.240529.123346.921189/gcc/libstdc++-v3/libsupc++ -I/dev/shm/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/libstdc++-v3/include/backward -I /dev/shm/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/libstdc++-v3/testsuite/util /usr/src/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/g cc/libstdc++-v3/testsuite/17_intro/names.cc-std=gnu++17 -D__GLIBCXX__= -fdiagnostics-plain-output -S -o names.s(timeout = 360) spawn -ignore SIGHUP /dev/shm/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/gcc.build.lnx/./gcc/xg++ -shared-libgcc -B/dev/shm/gcc-15-lra/src.gcc.ia64- toolchain-3.240529.123346.921189/gcc/gcc.build.lnx/./gcc -nostdinc++ -L/dev/shm/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/gcc.build.lnx/ia64-t2-linux-gnu/libstdc++-v3/src -L/dev/shm/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/gcc.build.lnx/ia64-t2-linux-gnu/libstdc++-v3/src/.libs -L/dev/shm/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/gcc.build.lnx/ia64-t2-linux-gnu/libstdc++-v3/libsupc++/.libs -B/usr/ia64-t2-linux-gnu/bin/ -B/usr/ia64-t2-linux-gnu/lib/ -isystem /usr/ia64-t2-linux-gnu/include -isystem /usr/ia64-t2-linux-gnu/sys-include -fchecking=1 -B/usr/src/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/gcc.build.lnx/ia64-t2-linux-gnu/./libstdc++-v3/src/.libs -fmessage-length=0 -fno-show-column -ffunction-se
[PATCH] Improve optimizer to avoid stack spill across pure function call
This patch was inspired from PR 110137. It reduces the amount of stack spilling by ensuring that more values are constant across a pure function call. It does not add any new flag; rather, it makes the optimizer generate more optimal code. For the added test file, the change is the following. As can be seen, the number of memory operations is cut in half (almost, because rbx = rdi also need to be saved in the "after" version). Before: _Z2ggO7MyClass: .LFB653: .cfi_startproc sub rsp, 72 .cfi_def_cfa_offset 80 movdqu xmm1, XMMWORD PTR [rdi] movdqu xmm0, XMMWORD PTR [rdi+16] movaps XMMWORD PTR [rsp+16], xmm1 movaps XMMWORD PTR [rsp], xmm0 call_Z1fv movdqa xmm1, XMMWORD PTR [rsp+16] movdqa xmm0, XMMWORD PTR [rsp] lea rdx, [rsp+32] movaps XMMWORD PTR [rsp+32], xmm1 movaps XMMWORD PTR [rsp+48], xmm0 add rsp, 72 .cfi_def_cfa_offset 8 ret .cfi_endproc After: _Z2ggO7MyClass: .LFB653: .cfi_startproc pushrbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 mov rbx, rdi sub rsp, 32 .cfi_def_cfa_offset 48 call_Z1fv movdqu xmm0, XMMWORD PTR [rbx] movaps XMMWORD PTR [rsp], xmm0 movdqu xmm0, XMMWORD PTR [rbx+16] movaps XMMWORD PTR [rsp+16], xmm0 add rsp, 32 .cfi_def_cfa_offset 16 pop rbx .cfi_def_cfa_offset 8 ret .cfi_endproc As explained in PR 110137, the reason I modify the RTL pass instead of the GIMPLE pass is that currently the code that handle the optimization is in the IRA. The optimization involved is: rewrite definition: a = something; ... use a; to move the definition statement right before the use statement, provided none of the statements inbetween modifies "something". The existing code only handle the case where "something" is a memory reference with a fixed address. The patch modifies the logic to also allow memory reference whose address is not changed by the statements inbetween. In order to do that the only way I can think of is to modify "validate_equiv_mem" to also validate the equivalence of the address, which may consist of a pseudo-register. Nevertheless, reviews and suggestions to improve the code/explain how to implement it in GIMPLE phase would be appreciated. Bootstrapped and regression tested on x86_64-pc-linux-gnu. I think the test passes but there are some spurious failures with some scan-assembler-* tests, looking through it it doesn't appear to pessimize the code. I think this also fix PR 103541 as a side effect, but I'm not sure if the generated code is optimal (it loads from the global variable twice, but then there's no readily usable caller-saved register so you need an additional memory operation anyway) From 3d8d04373716e031242678821c3e9904b6ac51cb Mon Sep 17 00:00:00 2001 From: user202729 Date: Mon, 20 May 2024 18:12:58 +0800 Subject: [PATCH] Allow moving init from dereference across pure function call gcc/ChangeLog: * ira.cc (struct equiv_mem_data): Rename to equiv_data. (validate_equiv_mem_from_store): Rename to validate_equiv_from_store. (uses_any_clobbered_hard_reg_p): New function. (validate_equiv_mem): Rename to valid_equiv. (equiv_init_movable_p): Relax several conditions. (update_equiv_regs): Relax condition that the source must be mem. (add_store_equivs): Update changed function name. gcc/testsuite/ChangeLog: * gcc.target/i386/avoid-stack-spill-pure-call.C: New test. --- gcc/ira.cc| 157 -- .../i386/avoid-stack-spill-pure-call.C| 32 2 files changed, 140 insertions(+), 49 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avoid-stack-spill-pure-call.C diff --git a/gcc/ira.cc b/gcc/ira.cc index 5642aea3caa..1a1718a18c5 100644 --- a/gcc/ira.cc +++ b/gcc/ira.cc @@ -3054,28 +3054,80 @@ struct equivalence static struct equivalence *reg_equiv; /* Used for communication between the following two functions. */ -struct equiv_mem_data +struct equiv_data { - /* A MEM that we wish to ensure remains unchanged. */ - rtx equiv_mem; + /* A rtx that we wish to ensure remains unchanged. */ + rtx equiv; - /* Set true if EQUIV_MEM is modified. */ - bool equiv_mem_modified; + /* Set true if EQUIV is modified. */ + bool equiv_modified; }; -/* If EQUIV_MEM is modified by modifying DEST, indicate that it is modified. +/* If EQUIV is modified by modifying DEST, indicate that it is modified. Called via note_stores. */ stati
Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab
Richard Biener writes: > On Fri, 14 Jun 2024, Richard Sandiford wrote: > >> Richard Biener writes: >> > The following retires vcond{,u,eq} optabs by stopping to use them >> > from the middle-end. Targets instead (should) implement vcond_mask >> > and vec_cmp{,u,eq} optabs. The PR this change refers to lists >> > possibly affected targets - those implementing these patterns, >> > and in particular it lists mips, sparc and ia64 as targets that >> > most definitely will regress while others might simply remove >> > their vcond{,u,eq} patterns. >> > >> > I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64. >> > I know riscv doesn't implement any of the legacy optabs. But less >> > maintained vector targets might need adjustments. >> > >> > I want to get rid of those optabs for GCC 15. If I don't hear from >> > you I will assume your target is fine. >> >> Great! Thanks for doing this. >> >> Is there a plan for how we should handle vector comparisons that >> have to be done as the inverse of the negated condition? Should >> targets simply not provide vec_cmp for such conditions and leave >> the target-independent code to deal with the fallout? (For a >> standalone comparison, it would invert the result. For a VEC_COND_EXPR >> it would swap the true and false values.) > > I would expect that the ISEL pass which currently deals with finding > valid combos of .VCMP{,U,EQ} and .VCOND_MASK deals with this. > So how do we deal with this right now? I expect RTL expansion will > do the inverse trick, no? I think in practice (at least for the targets I've worked on), the target's vec_cmp handles the inversion itself. Thus the main optimisation done by targets' vcond patterns is to avoid the inversion (and instead swap the true/false values) when the "opposite" comparison is the native one. Thanks, Richard
[PATCH 1/2] Add function filtering to gcov
Add the --include and --exclude flags to gcov to control what functions to report on. This is meant to make gcov more practical as an when writing test suites or performing other coverage experiments, which tends to focus on a few functions at the time. This really shines in combination with the -t/--stdout flag. With support for more expansive metrics in gcov like modified condition/decision coverage (MC/DC) and path coverage, output quickly gets overwhelming without filtering. The approach is quite simple: filters are egrep regexes and are evaluated left-to-right, and the last filter "wins", that is, if a function matches an --include and a subsequent --exclude, it should not be included in the output. All of the output machinery works on the function table, so by optionally (not) adding function makes the even the json output work as expected, and only minor changes are needed to suppress the filtered-out functions. Demo: math.c int mul (int a, int b) { return a * b; } int sub (int a, int b) { return a - b; } int sum (int a, int b) { return a + b; } Plain matches: $ gcov -t math --include=sum -:0:Source:filter.c -:0:Graph:filter.gcno -:0:Data:- -:0:Runs:0 #:9:int sum (int a, int b) { #: 10:return a + b; $ gcov -t math --include=mul -:0:Source:filter.c -:0:Graph:filter.gcno -:0:Data:- -:0:Runs:0 #:1:int mul (int a, int b) { #:2:return a * b; Regex match: $ gcov -t math --include=su -:0:Source:filter.c -:0:Graph:filter.gcno -:0:Data:- -:0:Runs:0 #:5:int sub (int a, int b) { #:6:return a - b; -:7:} #:9:int sum (int a, int b) { #: 10:return a + b; And similar for exclude: $ gcov -t math --exclude=sum -:0:Source:filter.c -:0:Graph:filter.gcno -:0:Data:- -:0:Runs:0 #:1:int mul (int a, int b) { #:2:return a * b; -:3:} #:5:int sub (int a, int b) { #:6:return a - b; And json, for good measure: $ gcov -t math --include=sum --json | jq ".files[].lines[]" { "line_number": 9, "function_name": "sum", "count": 0, "unexecuted_block": true, "block_ids": [], "branches": [], "calls": [] } { "line_number": 10, "function_name": "sum", "count": 0, "unexecuted_block": true, "block_ids": [ 2 ], "branches": [], "calls": [] } Note that the last function gets "clipped" when lines are associated to functions, which means the closing brace is dropped from the report. I hope this can be fixed, but considering it is not really a part of the function body, the gcov report is "complete". Matching generally work well for mangled names, as the mangled names also have the base symbol name in it. By default, functions are matched by the mangled name, which means matching on base names always work as expected. The -M flag makes the matching work on the demangled name which is quite useful when you only want to report on specific overloads and can use the full type names. Why not just use grep? grep is not really sufficient as grep is very line oriented, and the reports that benefit the most from filtering often unpredictably span multiple lines based on the state of coverage. For example, a condition coverage report for 3 terms/6 outcomes only outputs 1 line when all conditions are covered, and 7 with no lines covered. gcc/ChangeLog: * doc/gcov.texi: Add --include, --exclude, --match-on-demangled documentation. * gcov.cc (struct fnfilter): New. (print_usage): Add --include, --exclude, -M, --match-on-demangled. (process_args): Likewise. (release_structures): Release filters. (read_graph_file): Only add function_infos matching filters. (output_lines): Likewise. gcc/testsuite/ChangeLog: * lib/gcov.exp: Add filtering test function. * g++.dg/gcov/gcov-19.C: New test. * g++.dg/gcov/gcov-20.C: New test. * g++.dg/gcov/gcov-21.C: New test. * gcc.misc-tests/gcov-25.c: New test. * gcc.misc-tests/gcov-26.c: New test. * gcc.misc-tests/gcov-27.c: New test. * gcc.misc-tests/gcov-28.c: New test. --- gcc/doc/gcov.texi | 27 ++ gcc/gcov.cc| 128 +++-- gcc/testsuite/g++.dg/gcov/gcov-19.C| 35 +++ gcc/testsuite/g++.dg/gcov/gcov-20.C| 38 gcc/testsuite/g++.dg/gcov/gcov-21.C| 32 +++ gcc/testsuite/gcc.misc-tests/gcov-25.c | 25 + gcc/testsuite/gcc.misc-tests/gcov-26.c | 25 + gcc/testsuite/gcc.misc-tests/gcov-27.c | 24 + gcc/testsuite/gcc.misc-tests/gcov-28.c | 22 + gcc/testsuite/lib/gcov.exp | 53 +- 10 fil
[PATCH 2/2] Add examples on filteirng in gcov tutorial
Add a section with some examples on the --include and --exclude flags in the gcov tutorial. gcc/ChangeLog: * doc/gcov.texi: Add tutorial on function filtering. --- gcc/doc/gcov.texi | 86 +++ 1 file changed, 86 insertions(+) diff --git a/gcc/doc/gcov.texi b/gcc/doc/gcov.texi index a9690bf5a69..dc79bccb8cf 100644 --- a/gcc/doc/gcov.texi +++ b/gcc/doc/gcov.texi @@ -933,6 +933,92 @@ the file doesn't match the executable (differing number of basic block counts) it will ignore the contents of the file. It then adds in the new execution counts and finally writes the data to the file. +You can report on a subset of functions by using @option{--include} +and @option{--exclude}. This is very useful when combined with +@option{--stdout} trying to understand behavior and coverage for a +particular function by running a test, looking at @command{gcov} +output, testing another input, and running @command{gcov} again. + +@smallexample +$ gcov -m --stdout --include inc tmp +-:0:Source:tmp.cpp +-:0:Graph:tmp.gcno +-:0:Data:tmp.gcda +-:0:Runs:1 + 2*:8: void inc () @{ b++; @} +-- +Foo::inc(): +#:8: void inc () @{ b++; @} +-- +Foo::inc(): +2:8: void inc () @{ b++; @} +-- +@end smallexample + +@command{gcov} will match on mangled names by default, which you can +control with the @option{-M} flag. Note that matching and reporting +are independent, so you can match on mangled names while printing +demangled names, and vice versa. To report on the @code{int} +instantiation of @code{Foo} matching on mangled and demangled names: + +@smallexample +$ gcov -t -m -M tmp --include 'Foo' +-:0:Source:tmp.cpp +-:0:Graph:tmp.gcno +-:0:Data:tmp.gcda +-:0:Runs:1 +1:7: Foo(): b (1000) @{@} +2:8: void inc () @{ b++; @} +@end smallexample + +@smallexample +$ gcov -t -m tmp --include 'FooIi' +-:0:Source:tmp.cpp +-:0:Graph:tmp.gcno +-:0:Data:tmp.gcda +-:0:Runs:1 +1:7: Foo(): b (1000) @{@} +2:8: void inc () @{ b++; @} +@end smallexample + +The arguments to @option{--include} and @option{--exclude} are +extended regular expressions (like @command{grep -E}), so the pattern +@code{in.?} matches both @code{inc} and @code{main}. If used with +@option{-M} then all @code{int} instantiations of @code{Foo} would +match too. @option{--include} and @option{--exclude} can be used +multiple times, and if a name matches multiple filters it is the last +one to match which takes preference. For example, to match +@code{main} and the @code{int} instatiation of @code{inc}, while +omitting the @code{Foo} constructor: + +@smallexample +$ gcov -t -m -M --include in --exclude Foo --include '::inc' tmp +-:0:Source:tmp.cpp +-:0:Graph:tmp.gcno +-:0:Data:tmp.gcda +-:0:Runs:1 +2:8: void inc () @{ b++; @} +1: 18:main (void) +-: 19:@{ +-: 20: int i, total; +1: 21: Foo counter; +-: 22: +1: 23: counter.inc(); +1: 24: counter.inc(); +1: 25: total = 0; +-: 26: + 11: 27: for (i = 0; i < 10; i++) + 10: 28:total += i; +-: 29: + 1*: 30: int v = total > 100 ? 1 : 2; +-: 31: +1: 32: if (total != 45) +#: 33:printf ("Failure\n"); +-: 34: else +1: 35:printf ("Success\n"); +1: 36: return 0; +@end smallexample + @node Gcov and Optimization @section Using @command{gcov} with GCC Optimization -- 2.30.2
[PATCH][ivopts]: perform affine fold on unsigned addressing modes known not to overflow. [PR114932]
Hi All, When the patch for PR114074 was applied we saw a good boost in exchange2. This boost was partially caused by a simplification of the addressing modes. With the patch applied IV opts saw the following form for the base addressing; Base: (integer(kind=4) *) &block + ((sizetype) ((unsigned long) l0_19(D) * 324) + 36) vs what we normally get: Base: (integer(kind=4) *) &block + ((sizetype) ((integer(kind=8)) l0_19(D) * 81) + 9) * 4 This is because the patch promoted multiplies where one operand is a constant from a signed multiply to an unsigned one, to attempt to fold away the constant. This patch attempts the same but due to the various problems with SCEV and niters not being able to analyze the resulting forms (i.e. PR114322) we can't do it during SCEV or in the general form like in fold-const like extract_muldiv attempts. Instead this applies the simplification during IVopts initialization when we create the IV. Essentially when we know the IV won't overflow with regards to niters then we perform an affine fold which gets it to simplify the internal computation, even if this is signed because we know that for IVOPTs uses the IV won't ever overflow. This allows IV opts to see the simplified form without influencing the rest of the compiler. as mentioned in PR114074 it would be good to fix the missed optimization in the other passes so we can perform this in general. The reason this has a big impact on fortran code is that fortran doesn't seem to have unsigned integer types. As such all it's addressing are created with signed types and folding does not happen on them due to the possible overflow. concretely on AArch64 this changes the results from generation: mov x27, -108 mov x24, -72 mov x23, -36 add x21, x1, x0, lsl 2 add x19, x20, x22 .L5: add x0, x22, x19 add x19, x19, 324 ldr d1, [x0, x27] add v1.2s, v1.2s, v15.2s str d1, [x20, 216] ldr d0, [x0, x24] add v0.2s, v0.2s, v15.2s str d0, [x20, 252] ldr d31, [x0, x23] add v31.2s, v31.2s, v15.2s str d31, [x20, 288] bl digits_20_ cmp x21, x19 bne .L5 into: .L5: ldr d1, [x19, -108] add v1.2s, v1.2s, v15.2s str d1, [x20, 216] ldr d0, [x19, -72] add v0.2s, v0.2s, v15.2s str d0, [x20, 252] ldr d31, [x19, -36] add x19, x19, 324 add v31.2s, v31.2s, v15.2s str d31, [x20, 288] bl digits_20_ cmp x21, x19 bne .L5 The two patches together results in a 10% performance increase in exchange2 in SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in compile time. There's also a 5% performance improvement in fotonik3d and similar reduction in binary size. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: PR tree-optimization/114932 * tree-scalar-evolution.cc (alloc_iv): Perform affine unsigned fold. gcc/testsuite/ChangeLog: PR tree-optimization/114932 * gfortran.dg/addressing-modes_1.f90: New test. --- diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_1.f90 b/gcc/testsuite/gfortran.dg/addressing-modes_1.f90 new file mode 100644 index ..334d5bc47a16e53e9168bb1f90dfeff584b4e494 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/addressing-modes_1.f90 @@ -0,0 +1,37 @@ +! { dg-do compile { target aarch64-*-* } } +! { dg-additional-options "-w -Ofast" } + + module brute_force +integer, parameter :: r=9 + integer block(r, r, 0) +contains + subroutine brute + do + do + do + do +do + do + do i7 = l0, 1 + select case(1 ) + case(1) + block(:2, 7:, 1) = block(:2, 7:, i7) - 1 + end select +do i8 = 1, 1 + do i9 = 1, 1 +if(1 == 1) then +call digits_20 +end if +end do + end do +end do +end do + end do + end do + end do + end do + end do + end + end + +! { dg-final { scan-assembler-not {ldr\s+d([0-9]+),\s+\[x[0-9]+, x[0-9]+\]} } } diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc index 4338d7b64a6c2df6404b8d5e51c7f62c23006e72..f621e4ee924b930e1e1d68e35f3d3a0d52470811 100644 --- a/gcc/tree-ssa-loop-ivopts.cc +++ b/gcc/tree-ssa-loop-ivopts.cc @@ -1216,6 +1216,18 @@ alloc_iv (struct ivopts_data *data, tree base, tree step, ba
[PATCH][ivopts]: use affine_tree when comparing IVs during candidate selection [PR114932]
Hi All, IVOPTS normally uses affine trees to perform comparisons between different IVs, but these seem to have been missing in two key spots and instead normal tree equivalencies used. In some cases where we have a structural equivalence but not a signedness equivalencies we end up generating both a signed and unsigned IV for the same candidate. This happens quite a lot with fortran but can also happen in C because this came code is unable to figure out when one expression is a multiple of another. As an example in the attached testcase we get: Initial set of candidates: cost: 24 (complexity 3) reg_cost: 9 cand_cost: 15 cand_group_cost: 0 (complexity 3) candidates: 1, 6, 8 group:0 --> iv_cand:6, cost=(0,1) group:1 --> iv_cand:1, cost=(0,0) group:2 --> iv_cand:8, cost=(0,1) group:3 --> iv_cand:8, cost=(0,1) invariant variables: 6 invariant expressions: 1, 2 : inv_expr 1: stride.3_27 * 4 inv_expr 2: (unsigned long) stride.3_27 * 4 These end up being used in the same group: Group 1: cand costcompl. inv.expr. inv.vars 1 0 0 NIL;6 2 0 0 NIL;6 3 0 0 NIL;6 which ends up with IV opts picking the signed and unsigned IVs: Improved to: cost: 24 (complexity 3) reg_cost: 9 cand_cost: 15 cand_group_cost: 0 (complexity 3) candidates: 1, 6, 8 group:0 --> iv_cand:6, cost=(0,1) group:1 --> iv_cand:1, cost=(0,0) group:2 --> iv_cand:8, cost=(0,1) group:3 --> iv_cand:8, cost=(0,1) invariant variables: 6 invariant expressions: 1, 2 and so generates the same IV as both signed and unsigned: ;; basic block 21, loop depth 3, count 214748368 (estimated locally, freq 58.2545), maybe hot ;;prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED) ;;pred: 28 [always] count:23622320 (estimated locally, freq 6.4080) (FALLTHRU,EXECUTABLE) ;;25 [always] count:191126046 (estimated locally, freq 51.8465) (FALLTHRU,DFS_BACK,EXECUTABLE) # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)> # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)> # ivtmp.26_51 = PHI # ivtmp.28_90 = PHI ... ;; basic block 24, loop depth 3, count 214748366 (estimated locally, freq 58.2545), maybe hot ;;prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)' ;;pred: 22 [always] count:95443719 (estimated locally, freq 25.8909) (FALLTHRU) ;;21 [33.3% (guessed)] count:71582790 (estimated locally, freq 19.4182) (TRUE_VALUE,EXECUTABLE) ;; 31 [33.3% (guessed)] count:47721860 (estimated locally, freq 12.9455) (TRUE_VALUE,EXECUTABLE) # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)> ivtmp.22_82 = ivtmp.22_41 + 1; ivtmp.26_72 = ivtmp.26_51 + _80; ivtmp.28_98 = ivtmp.28_90 + _39; These two IVs are always used as unsigned, so IV ops generates: _73 = stride.3_27 * 4; _80 = (unsigned long) _73; _54 = (unsigned long) stride.3_27; _39 = _54 * 4; Which means that in e.g. exchange2 we generate a lot of duplicate code. This is because candidate 6 and 8 are structurally equivalent but have different signs. This patch changes it so that if you have two IVs that are affine equivalent to just pick one over the other. IV already has code for this, so the patch just uses affine trees instead of tree for the check. With it we get: : inv_expr 1: stride.3_27 * 4 : Group 0: cand costcompl. inv.expr. inv.vars 5 0 2 NIL;NIL; 6 0 3 NIL;NIL; Group 1: cand costcompl. inv.expr. inv.vars 1 0 0 NIL;6 2 0 0 NIL;6 3 0 0 NIL;6 4 0 0 NIL;6 Initial set of candidates: cost: 16 (complexity 3) reg_cost: 6 can
Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab
On Fri, 14 Jun 2024, Richard Sandiford wrote: > Richard Biener writes: > > On Fri, 14 Jun 2024, Richard Sandiford wrote: > > > >> Richard Biener writes: > >> > The following retires vcond{,u,eq} optabs by stopping to use them > >> > from the middle-end. Targets instead (should) implement vcond_mask > >> > and vec_cmp{,u,eq} optabs. The PR this change refers to lists > >> > possibly affected targets - those implementing these patterns, > >> > and in particular it lists mips, sparc and ia64 as targets that > >> > most definitely will regress while others might simply remove > >> > their vcond{,u,eq} patterns. > >> > > >> > I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64. > >> > I know riscv doesn't implement any of the legacy optabs. But less > >> > maintained vector targets might need adjustments. > >> > > >> > I want to get rid of those optabs for GCC 15. If I don't hear from > >> > you I will assume your target is fine. > >> > >> Great! Thanks for doing this. > >> > >> Is there a plan for how we should handle vector comparisons that > >> have to be done as the inverse of the negated condition? Should > >> targets simply not provide vec_cmp for such conditions and leave > >> the target-independent code to deal with the fallout? (For a > >> standalone comparison, it would invert the result. For a VEC_COND_EXPR > >> it would swap the true and false values.) > > > > I would expect that the ISEL pass which currently deals with finding > > valid combos of .VCMP{,U,EQ} and .VCOND_MASK deals with this. > > So how do we deal with this right now? I expect RTL expansion will > > do the inverse trick, no? > > I think in practice (at least for the targets I've worked on), > the target's vec_cmp handles the inversion itself. Thus the > main optimisation done by targets' vcond patterns is to avoid > the inversion (and instead swap the true/false values) when the > "opposite" comparison is the native one. I see. I suppose whether or not vec_cmp is handled is determined by a FAIL so it's somewhat difficult to determine this at ISEL time. Meanwhile I've opened PR115490 with the x86 fallout from applying the patch. Richard.
Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab
On Fri, 14 Jun 2024, Richard Biener wrote: > On Fri, 14 Jun 2024, Richard Sandiford wrote: > > > Richard Biener writes: > > > On Fri, 14 Jun 2024, Richard Sandiford wrote: > > > > > >> Richard Biener writes: > > >> > The following retires vcond{,u,eq} optabs by stopping to use them > > >> > from the middle-end. Targets instead (should) implement vcond_mask > > >> > and vec_cmp{,u,eq} optabs. The PR this change refers to lists > > >> > possibly affected targets - those implementing these patterns, > > >> > and in particular it lists mips, sparc and ia64 as targets that > > >> > most definitely will regress while others might simply remove > > >> > their vcond{,u,eq} patterns. > > >> > > > >> > I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64. > > >> > I know riscv doesn't implement any of the legacy optabs. But less > > >> > maintained vector targets might need adjustments. > > >> > > > >> > I want to get rid of those optabs for GCC 15. If I don't hear from > > >> > you I will assume your target is fine. > > >> > > >> Great! Thanks for doing this. > > >> > > >> Is there a plan for how we should handle vector comparisons that > > >> have to be done as the inverse of the negated condition? Should > > >> targets simply not provide vec_cmp for such conditions and leave > > >> the target-independent code to deal with the fallout? (For a > > >> standalone comparison, it would invert the result. For a VEC_COND_EXPR > > >> it would swap the true and false values.) > > > > > > I would expect that the ISEL pass which currently deals with finding > > > valid combos of .VCMP{,U,EQ} and .VCOND_MASK deals with this. > > > So how do we deal with this right now? I expect RTL expansion will > > > do the inverse trick, no? > > > > I think in practice (at least for the targets I've worked on), > > the target's vec_cmp handles the inversion itself. Thus the > > main optimisation done by targets' vcond patterns is to avoid > > the inversion (and instead swap the true/false values) when the > > "opposite" comparison is the native one. > > I see. I suppose whether or not vec_cmp is handled is determined > by a FAIL so it's somewhat difficult to determine this at ISEL time. I'll also note that we document vec_cmp{,u,eq} as having all zeros, all ones for the result while vcond_mask might only care for the MSB (it's documented to work on the result of a pre-computed vector comparison). So this eventually asks for targets to work out the optimal sequence via combine helpers and thus eventually splitters to fixup invalid compare operators late? Richard.
[PATCH] Build/Cross: Look for target headers from include if sys-include doesn't exist
PR 115416 When we build a cross toolchain, while without --with-sysroot, target headers are expected in ${test_exec_prefix}/${target_noncanonical}/sys-include while it is true only with --with-headers option is used. In other cases, the path should be ${test_exec_prefix}/${target_noncanonical}/include such as Debian's cross toolchain. Debian's cross toolchain has directory structures like: /usr//lib /include /bin/ For this case, we cannot use "--prefix=/usr --with-sysroot=/", as gcc/configure will use headers of build, aka in /usr/include to detect features. And fixinclude also uses the headers of build. Let's use the `include` if `sys-include` doesn't exist. For Makefile.in, the compare @includedir@ and $(prefix)/include is not correct, as the --includedir option is used to set where the headers should be installed. gcc: PR 115415. configure.ac: Set target_header_dir and CROSS_SYSTEM_HEADER_DIR to ${test_exec_prefix}/${target_noncanonical}/include when cross and without --with-sysroot and without --with-headers. configure: Regenerate. Makefile.in: Set CROSS_SYSTEM_HEADER_DIR as configure, and don't compare @includedir@ and $(prefix)/include. --- gcc/Makefile.in | 6 +- gcc/configure| 8 ++-- gcc/configure.ac | 4 3 files changed, 11 insertions(+), 7 deletions(-) diff --git a/gcc/Makefile.in b/gcc/Makefile.in index f5adb647d3f..349f988dc08 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -560,11 +560,7 @@ LINKER_PLUGIN_API_H = $(srcdir)/../include/plugin-api.h # Default native SYSTEM_HEADER_DIR, to be overridden by targets. NATIVE_SYSTEM_HEADER_DIR = @NATIVE_SYSTEM_HEADER_DIR@ # Default cross SYSTEM_HEADER_DIR, to be overridden by targets. -ifeq (@includedir@,$(prefix)/include) - CROSS_SYSTEM_HEADER_DIR = @CROSS_SYSTEM_HEADER_DIR@ -else - CROSS_SYSTEM_HEADER_DIR = @includedir@ -endif +CROSS_SYSTEM_HEADER_DIR = @CROSS_SYSTEM_HEADER_DIR@ # autoconf sets SYSTEM_HEADER_DIR to one of the above. # Purge it of unnecessary internal relative paths diff --git a/gcc/configure b/gcc/configure index aaf5899cc03..d11e97d1758 100755 --- a/gcc/configure +++ b/gcc/configure @@ -15124,6 +15124,10 @@ if test x$host != x$target || test "x$TARGET_SYSTEM_ROOT" != x || target_header_dir="${with_build_sysroot}${native_system_header_dir}" elif test "x$with_sysroot" = x; then target_header_dir="${test_exec_prefix}/${target_noncanonical}/sys-include" +if ! test -d ${target_header_dir};then + target_header_dir="${test_exec_prefix}/${target_noncanonical}/include" +fi +CROSS_SYSTEM_HEADER_DIR=${target_header_dir} elif test "x$with_sysroot" = xyes; then target_header_dir="${test_exec_prefix}/${target_noncanonical}/sys-root${native_system_header_dir}" else @@ -21410,7 +21414,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 21413 "configure" +#line 21417 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -21516,7 +21520,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 21519 "configure" +#line 21523 "configure" #include "confdefs.h" #if HAVE_DLFCN_H diff --git a/gcc/configure.ac b/gcc/configure.ac index f8d67efeb98..54e6776747e 100644 --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -2512,6 +2512,10 @@ if test x$host != x$target || test "x$TARGET_SYSTEM_ROOT" != x || target_header_dir="${with_build_sysroot}${native_system_header_dir}" elif test "x$with_sysroot" = x; then target_header_dir="${test_exec_prefix}/${target_noncanonical}/sys-include" +if test -d ${target_header_dir};then + target_header_dir="${test_exec_prefix}/${target_noncanonical}/include" +fi +CROSS_SYSTEM_HEADER_DIR=${target_header_dir} elif test "x$with_sysroot" = xyes; then target_header_dir="${test_exec_prefix}/${target_noncanonical}/sys-root${native_system_header_dir}" else -- 2.39.3 (Apple Git-146)
[PATCHSET 14.2] Add documentation rules for Rust frontend
Hi everyone, This is a quick and simple patchset to our Makefile in order to have proper documentation rules for the Rust frontend. Let me know if these changes would be accepted into 14.2, or if we should rather integrate them to trunk for 15.1 as part of our upstreaming process. If the changes are accepted for 14.2, I will be following them up with .texi changes to improve the documentation of the Rust frontend. Best, Arthur [PATCH 14.2 1/3] gccrs: Add base documentation for using the Rust [PATCH 14.2 2/3] rust: Add rust.install-dvi and rust.install-html [PATCH 14.2 3/3] rust: Copy install-html rule from Ada frontend
[PATCH 14.2 2/3] rust: Add rust.install-dvi and rust.install-html rules
From: Christophe Lyon rust has the (empty) rust.dvi and rust.html rules, but lacks the (empty) rust.install-dvi and rust.install-html ones. 2024-04-04 Christophe Lyon gcc/rust/ * Make-lang.in (rust.install-dvi, rust.install-html): New rules. --- gcc/rust/Make-lang.in | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gcc/rust/Make-lang.in b/gcc/rust/Make-lang.in index f0fc582bd26..2395541425f 100644 --- a/gcc/rust/Make-lang.in +++ b/gcc/rust/Make-lang.in @@ -374,6 +374,8 @@ selftest-rust-valgrind: $(RUST_SELFTEST_DEPS) # should have dependencies on info files that should be installed. rust.install-info: +rust.install-dvi: +rust.install-html: rust.install-pdf: # Install man pages for the front end. This target should ignore errors. -- 2.42.0
[PATCH 14.2 1/3] gccrs: Add base documentation for using the Rust frontend.
gcc/rust/ChangeLog: * Make-lang.in: Add documentation targets. * gccrs.texi: New file. --- gcc/rust/Make-lang.in | 25 +++-- gcc/rust/gccrs.texi | 207 ++ 2 files changed, 225 insertions(+), 7 deletions(-) create mode 100644 gcc/rust/gccrs.texi diff --git a/gcc/rust/Make-lang.in b/gcc/rust/Make-lang.in index dd94c9b5eab..f0fc582bd26 100644 --- a/gcc/rust/Make-lang.in +++ b/gcc/rust/Make-lang.in @@ -283,27 +283,38 @@ rust.tags: force # Build documentation hooks. +RUST_TEXI = \ + rust/rust.texi + # Build info documentation for the front end, in the build directory. This target is only called by # ‘make bootstrap’ if a suitable version of makeinfo is available, so does not need to check for this, # and should fail if an error occurs. rust.info: + if [ x$(BUILD_INFO) = xinfo ]; then \ + $(MAKEINFO) $(MAKEINFOFLAGS) -I $(abs_docdir) -I $(abs_docdir)/include -o $@ $<; \ + else \ + true; \ + fi rust.srcinfo: # Build DVI documentation for the front end, in the build directory. This should be done using # $(TEXI2DVI), with appropriate -I arguments pointing to directories of included files. -rust.dvi: +rust.dvi: $(RUST_TEXI) + $(TEXI2DVI) -I $(abs_docdir) -I $(abs_docdir)/include -o $@ $< # Build PDF documentation for the front end, in the build directory. This should be done using # $(TEXI2PDF), with appropriate -I arguments pointing to directories of included files. -rust.pdf: - -doc/rust.info: -doc/rust.dvi: -doc/rust.pdf: +rust.pdf: $(RUST_TEXI) + $(TEXI2PDF) -I $(abs_docdir) -I $(abs_docdir)/include -o $@ $< # Build HTML documentation for the front end, in the build directory. -rust.html: +rust.html: $(RUST_TEXI) + $(TEXI2HTML) -I $(abs_docdir) -I $(abs_docdir)/include -o $@ $< + +doc/rust.info: rust.info +doc/rust.dvi: rust.dvi +doc/rust.pdf: rust.pdf # Install hooks. diff --git a/gcc/rust/gccrs.texi b/gcc/rust/gccrs.texi new file mode 100644 index 000..52f055eaad8 --- /dev/null +++ b/gcc/rust/gccrs.texi @@ -0,0 +1,207 @@ +\input texinfo @c -*-texinfo-*- +@setfilename gccrs.info +@settitle The GNU Rust Compiler + +@c Merge the standard indexes into a single one. +@syncodeindex fn cp +@syncodeindex vr cp +@syncodeindex ky cp +@syncodeindex pg cp +@syncodeindex tp cp + +@include gcc-common.texi + +@c Copyright years for this manual. +@set copyrights-gccrs 2024 + +@copying +@c man begin COPYRIGHT +Copyright @copyright{} @value{copyrights-gccrs} Free Software Foundation, Inc. + +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, Version 1.3 or +any later version published by the Free Software Foundation; with no +Invariant Sections, the Front-Cover Texts being (a) (see below), and +with the Back-Cover Texts being (b) (see below). +A copy of the license is included in the +@c man end +section entitled ``GNU Free Documentation License''. +@ignore +@c man begin COPYRIGHT +man page gfdl(7). +@c man end +@end ignore + +@c man begin COPYRIGHT + +(a) The FSF's Front-Cover Text is: + + A GNU Manual + +(b) The FSF's Back-Cover Text is: + + You have freedom to copy and modify this GNU Manual, like GNU + software. Copies published by the Free Software Foundation raise + funds for GNU development. +@c man end +@end copying + +@ifinfo +@format +@dircategory Software development +@direntry +* gccrs: (gccrs). A GCC-based compiler for the Rust programming language +@end direntry +@end format + +@insertcopying +@end ifinfo + +@titlepage +@title The GNU Rust Compiler +@versionsubtitle +@author The gccrs team + +@page +@vskip 0pt plus 1filll +Published by the Free Software Foundation @* +51 Franklin Street, Fifth Floor@* +Boston, MA 02110-1301, USA@* +@sp 1 +@insertcopying +@end titlepage +@contents +@page + +@node Top +@top Introduction + +This manual describes how to use @command{gccrs}, the GNU compiler for +the Rust programming language. This manual is specifically about +@command{gccrs}. For more information about the Rust programming +language, see @uref{https//rust-lang.org}. + +@menu +* Copying:: The GNU General Public License. +* GNU Free Documentation License:: +How you can share and copy this manual. +* Invoking gccrs:: How to run gccrs. +* Index:: Index. +@end menu + + +@include gpl_v3.texi + +@include fdl.texi + +@node Invoking gccrs +@chapter Invoking gccrs + +@c man title gccrs A GCC-based compiler for the Rust programming language + +@c man begin DESCRIPTION gccrs + +The @command{gccrs} command is a frontend to @command{gcc} and +supports many of the same options, @xref{Option Summary, , Option +Summary, gcc, Using the GNU Compiler Collection (GCC)}, as well +as some @command{gccrs} specific ones. This manual only documents +the options that are specific to @com
[PATCH 14.2 3/3] rust: Copy install-html rule from Ada frontend
gcc/rust/ChangeLog: * Make-lang.in: Add proper rust.install-html rule. --- gcc/rust/Make-lang.in | 19 ++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/gcc/rust/Make-lang.in b/gcc/rust/Make-lang.in index 2395541425f..815a8b315c0 100644 --- a/gcc/rust/Make-lang.in +++ b/gcc/rust/Make-lang.in @@ -375,7 +375,24 @@ selftest-rust-valgrind: $(RUST_SELFTEST_DEPS) rust.install-info: rust.install-dvi: -rust.install-html: + +rust.install-html: $(build_htmldir)/rust + @$(NORMAL_INSTALL) + test -z "$(htmldir)" || $(mkinstalldirs) "$(DESTDIR)$(htmldir)" + @for p in $(build_htmldir)/rust; do \ + if test -f "$$p" || test -d "$$p"; then d=""; else d="$(srcdir)/"; fi; \ + f=$(html__strip_dir) \ + if test -d "$$d$$p"; then \ + echo " $(mkinstalldirs) '$(DESTDIR)$(htmldir)/$$f'"; \ + $(mkinstalldirs) "$(DESTDIR)$(htmldir)/$$f" || exit 1; \ + echo " $(INSTALL_DATA) '$$d$$p'/* '$(DESTDIR)$(htmldir)/$$f'"; \ + $(INSTALL_DATA) "$$d$$p"/* "$(DESTDIR)$(htmldir)/$$f"; \ + else \ + echo " $(INSTALL_DATA) '$$d$$p' '$(DESTDIR)$(htmldir)/$$f'"; \ + $(INSTALL_DATA) "$$d$$p" "$(DESTDIR)$(htmldir)/$$f"; \ + fi; \ + done + rust.install-pdf: # Install man pages for the front end. This target should ignore errors. -- 2.42.0
[PATCH] Enhance if-conversion for automatic arrays
Automatic arrays that are not address-taken should not be subject to store data races. This applies to OMP SIMD in-branch lowered functions result array which for the testcase otherwise prevents vectorization with SSE and for AVX and AVX512 ends up with spurious .MASK_STORE to the stack surviving. This inefficiency was noted in PR111793. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Is my idea of store data races correct? At least phiopt uses the same check but for example LIM doesn't special-case locals. PR tree-optimization/111793 * tree-if-conv.cc (ifcvt_memrefs_wont_trap): For stores that do not trap only consider -fstore-data-races when the underlying object is not automatic or has its address taken. * gcc.dg/vect/vect-simd-clone-21.c: New testcase. --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c | 16 gcc/tree-if-conv.cc| 13 +++-- 2 files changed, 27 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c new file mode 100644 index 000..49c52fb59bd --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd" } */ + +#pragma omp declare simd simdlen(4) inbranch +__attribute__((noinline)) int +foo (int a, int b) +{ + return a + b; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" { target i?86-*-* x86_64-*-* } } } */ +/* if-conversion shouldn't need to resort to masked stores for the result + array created by OMP lowering since that's automatic and does not have + its address taken. */ +/* { dg-final { scan-tree-dump-not "MASK_STORE" "vect" } } */ diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index c4c3ed41a44..974c614edf3 100644 --- a/gcc/tree-if-conv.cc +++ b/gcc/tree-if-conv.cc @@ -934,14 +934,23 @@ ifcvt_memrefs_wont_trap (gimple *stmt, vec drs) if (DR_IS_READ (a)) return true; + bool ok = flag_store_data_races; + base = get_base_address (base); + if (DECL_P (base) + && auto_var_in_fn_p (base, cfun->decl) + && ! may_be_aliased (base)) + /* Automatic variables not aliased are not subject to + data races. */ + ok = true; + /* an unconditionaly write won't trap if the base is written to unconditionally. */ if (base_master_dr && DR_BASE_W_UNCONDITIONALLY (*base_master_dr)) - return flag_store_data_races; + return ok; /* or the base is known to be not readonly. */ else if (base_object_writable (DR_REF (a))) - return flag_store_data_races; + return ok; } return false; -- 2.35.3
Re: [PATCH 2/3] Enabled LRA for ia64.
On Fri, 14 Jun 2024 at 12:07, Frank Scheiner wrote: > > Dear Jonathan, Jeff, > > On 13.06.24 12:33, Jonathan Wakely wrote: > > On Wed, 12 Jun 2024 at 22:00, Frank Scheiner wrote: > >> Ok, I posted the results as created by contrib/test_summary now: > >> > >> 1. non-LRA version on [1] > >> > >> 2. LRA version on [2] > >> > >> [1]: https://gcc.gnu.org/pipermail/gcc-testresults/2024-June/817267.html > >> > >> [2]: https://gcc.gnu.org/pipermail/gcc-testresults/2024-June/817268.html > > > > Thanks! > > > > These ones are probably due to non-reserved names in glibc or kernel > > headers: > > > > FAIL: 17_intro/names.cc -std=gnu++17 (test for excess errors) > > FAIL: 17_intro/names_pstl.cc -std=gnu++17 (test for excess errors) > > FAIL: experimental/names.cc -std=gnu++17 (test for excess errors) > > > > The errors for all three are probably the same and should be > > decipherable from libstdc++.log which will show which names defined as > > macros in names.cc are clashing with names in system headers. > > Thanks for the hints. I looked into the log now and searched for the > respective tests. And the log excerpts for all three failing tests (also > attached with more context): > > ``` > FAIL: 17_intro/names.cc -std=gnu++17 (test for excess errors) > Excess errors: > /usr/include/bits/sigcontext.h:37: error: expected unqualified-id before > ';' token > /usr/include/bits/sigcontext.h:37: error: expected ')' before ';' token > /usr/include/sys/ucontext.h:44: error: expected unqualified-id before > ';' token > /usr/include/sys/ucontext.h:44: error: expected ')' before ';' token > ``` > > ...point to two headers which are part of glibc 2.39 (w/ia64 support > re-added): > > * /usr/include/bits/sigcontext.h:32-38: > ``` > 32 struct __ia64_fpreg > 33 { > 34 union > 35 { > 36 unsigned long bits[2]; > 37 } u; > 38 } __attribute__ ((__aligned__ (16))); > ``` > > * /usr/include/sys/ucontext.h:39-45: > ``` > 39 struct __ia64_fpreg_mcontext > 40 { > 41 union > 42 { > 43 unsigned long __ctx(bits)[2]; > 44 } __ctx(u); > 45 } __attribute__ ((__aligned__ (16))); > ``` This is an ia64-specific glibc bug, because it means the following valid C program is rejected: #define _GNU_SOURCE #define bits 111 #define u no u #include int main() { }
[PATCH] analyzer: Fix g++ 4.8 bootstrap without using std::move to return std::unique_ptr
This builds successfully using both gcc-4.8.5 and gcc-13.3.1 as host compiler. OK for trunk if testing succeeds? -- >8 -- Revert the changes in r15--ge22b7f741ab54f and fix bootstrap with GCC 4.8 a different way. The original problem is not related to C++17 guaranteed copy elision, it's related to Core DR 1579 [1], which was part of C++14 but only implemented in G++ as a C++11 DR with r5-1576-gfb682f9458c6cf (so GCC 4.8 doesn't implement it). The original fix causes -Wredundant-move warnings with GCC trunk. [1] https://cplusplus.github.io/CWG/issues/1579.html gcc/analyzer/ChangeLog * constraint-manager.cc (equiv_class::make_dump_widget): Change return type to match return value and do not use std::move on return value. (bounded_ranges_constraint::make_dump_widget): Likewise. (constraint_manager::make_dump_widget): Likewise. * constraint-manager.h (equiv_class::make_dump_widget): Change return type. (bounded_ranges_constraint::make_dump_widget): Likewise. (constraint_manager::make_dump_widget): Likewise. * program-state.cc (sm_state_map::make_dump_widget): Likewise. (program_state::make_dump_widget): Likewise. * program-state.h (sm_state_map::make_dump_widget): Likewise. (program_state::make_dump_widget): Likewise. * region-model.cc (region_to_value_map::make_dump_widget): Likewise. (region_model::make_dump_widget): Likewise. * region-model.h (region_to_value_map::make_dump_widget): Likewise. (region_model::make_dump_widget): Likewise. * region.cc (region::make_dump_widget): Likewise. * region.h (region::make_dump_widget): Likewise. * store.cc (binding_cluster::make_dump_widget): Likewise. (store::make_dump_widget): Likewise. * store.h (binding_cluster::make_dump_widget): Likewise. (store::make_dump_widget): Likewise. * svalue.cc (svalue::make_dump_widget): Likewise. * svalue.h (svalue::make_dump_widget): Likewise. --- gcc/analyzer/constraint-manager.cc | 12 ++-- gcc/analyzer/constraint-manager.h | 6 +++--- gcc/analyzer/program-state.cc | 8 gcc/analyzer/program-state.h | 4 ++-- gcc/analyzer/region-model.cc | 8 gcc/analyzer/region-model.h| 4 ++-- gcc/analyzer/region.cc | 4 ++-- gcc/analyzer/region.h | 2 +- gcc/analyzer/store.cc | 8 gcc/analyzer/store.h | 4 ++-- gcc/analyzer/svalue.cc | 4 ++-- gcc/analyzer/svalue.h | 2 +- 12 files changed, 33 insertions(+), 33 deletions(-) diff --git a/gcc/analyzer/constraint-manager.cc b/gcc/analyzer/constraint-manager.cc index a9d58c9cdcf..29539060ebd 100644 --- a/gcc/analyzer/constraint-manager.cc +++ b/gcc/analyzer/constraint-manager.cc @@ -1146,7 +1146,7 @@ equiv_class::to_json () const return ec_obj; } -std::unique_ptr +std::unique_ptr equiv_class::make_dump_widget (const text_art::dump_widget_info &dwi, unsigned id) const { @@ -1176,7 +1176,7 @@ equiv_class::make_dump_widget (const text_art::dump_widget_info &dwi, ec_widget->add_child (tree_widget::make (dwi, &pp)); } - return std::move (ec_widget); + return ec_widget; } /* Generate a hash value for this equiv_class. @@ -1491,7 +1491,7 @@ bounded_ranges_constraint::to_json () const return con_obj; } -std::unique_ptr +std::unique_ptr bounded_ranges_constraint:: make_dump_widget (const text_art::dump_widget_info &dwi) const { @@ -1500,7 +1500,7 @@ make_dump_widget (const text_art::dump_widget_info &dwi) const (tree_widget::from_fmt (dwi, nullptr, "ec%i bounded ranges", m_ec_id.as_int ())); m_ranges->add_to_dump_widget (*brc_widget.get (), dwi); - return std::move (brc_widget); + return brc_widget; } bool @@ -1829,7 +1829,7 @@ constraint_manager::to_json () const return cm_obj; } -std::unique_ptr +std::unique_ptr constraint_manager::make_dump_widget (const text_art::dump_widget_info &dwi) const { using text_art::tree_widget; @@ -1853,7 +1853,7 @@ constraint_manager::make_dump_widget (const text_art::dump_widget_info &dwi) con if (cm_widget->get_num_children () == 0) return nullptr; - return std::move (cm_widget); + return cm_widget; } /* Attempt to add the constraint LHS OP RHS to this constraint_manager. diff --git a/gcc/analyzer/constraint-manager.h b/gcc/analyzer/constraint-manager.h index 31556aebc7a..81e9c7ec035 100644 --- a/gcc/analyzer/constraint-manager.h +++ b/gcc/analyzer/constraint-manager.h @@ -273,7 +273,7 @@ public: json::object *to_json () const; - std::unique_ptr + std::unique_ptr make_dump_widget (const text_art::dump_widget_info &dwi, unsigned id) const; @@ -408,7 +408,7 @@ public: void add_to_hash (inchash::hash *hstate) const; - std::unique_ptr + std::un
Re: [PATCH 2/3] Enabled LRA for ia64.
On 14.06.24 14:53, Jonathan Wakely wrote: On Fri, 14 Jun 2024 at 12:07, Frank Scheiner wrote: ...point to two headers which are part of glibc 2.39 (w/ia64 support re-added): * /usr/include/bits/sigcontext.h:32-38: ``` 32 struct __ia64_fpreg 33 { 34 union 35 { 36 unsigned long bits[2]; 37 } u; 38 } __attribute__ ((__aligned__ (16))); ``` * /usr/include/sys/ucontext.h:39-45: ``` 39 struct __ia64_fpreg_mcontext 40 { 41 union 42 { 43 unsigned long __ctx(bits)[2]; 44 } __ctx(u); 45 } __attribute__ ((__aligned__ (16))); ``` This is an ia64-specific glibc bug, because it means the following valid C program is rejected: #define _GNU_SOURCE #define bits 111 #define u no u #include int main() { } Ok, I see. Should this then be "worked around" like in [1] with something like: ``` #if defined (__linux__) && defined (__ia64__) // defines __ia64_fpreg_mcontext::u #undef u #endif ``` ...in `libstdc++-v3/testsuite/17_intro/names.cc` or should we fix it in glibc instead? [1]: https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=cf5f7791056b3ed993bc8024be767a86157514a9 Cheers, Frank
RE: [PATCH] tree-optimization/114589 - remove profile based sink heuristics
Hi Richard, Here is one PR related to this patch (by git bisect), details as below. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115458 I am still trying to narrow down which change caused this failure or any hints here? Pan -Original Message- From: Richard Biener Sent: Wednesday, May 15, 2024 5:39 PM To: gcc-patches@gcc.gnu.org Subject: [PATCH] tree-optimization/114589 - remove profile based sink heuristics The following removes the profile based heuristic limiting sinking and instead uses post-dominators to avoid sinking to places that are executed under the same conditions as the earlier location which the profile based heuristic should have guaranteed as well. To avoid regressing this moves the empty-latch check to cover all sink cases. It also stream-lines the resulting select_best_block a bit but avoids adjusting heuristics more with this change. gfortran.dg/streamio_9.f90 starts execute failing with this on x86_64 with -m32 because the (float)i * 9....e-7 compute is sunk across a STOP causing it to be no longer spilled and thus the compare failing due to excess precision. The patch adds -ffloat-store to avoid this, following other similar testcases. This change doesn't fix the testcase in the PR on itself. Bootstrapped on x86_64-unknown-linux-gnu, re-testing in progress. PR tree-optimization/114589 * tree-ssa-sink.cc (select_best_block): Remove profile-based heuristics. Instead reject sink locations that sink to post-dominators. Move empty latch check here from statement_sink_location. Also consider early_bb for the loop depth check. (statement_sink_location): Remove superfluous check. Remove empty latch check. (pass_sink_code::execute): Compute/release post-dominators. * gfortran.dg/streamio_9.f90: Use -ffloat-store to avoid excess precision when not spilling. --- gcc/testsuite/gfortran.dg/streamio_9.f90 | 1 + gcc/tree-ssa-sink.cc | 62 2 files changed, 20 insertions(+), 43 deletions(-) diff --git a/gcc/testsuite/gfortran.dg/streamio_9.f90 b/gcc/testsuite/gfortran.dg/streamio_9.f90 index b6bddb973f8..f29ded6ba54 100644 --- a/gcc/testsuite/gfortran.dg/streamio_9.f90 +++ b/gcc/testsuite/gfortran.dg/streamio_9.f90 @@ -1,4 +1,5 @@ ! { dg-do run } +! { dg-options "-ffloat-store" } ! PR29053 Stream IO test 9. ! Contributed by Jerry DeLisle . ! Test case derived from that given in PR by Steve Kargl. diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc index 2f90acb7ef4..2188b7523c7 100644 --- a/gcc/tree-ssa-sink.cc +++ b/gcc/tree-ssa-sink.cc @@ -178,15 +178,7 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts) We want the most control dependent block in the shallowest loop nest. - If the resulting block is in a shallower loop nest, then use it. Else - only use the resulting block if it has significantly lower execution - frequency than EARLY_BB to avoid gratuitous statement movement. We - consider statements with VOPS more desirable to move. - - This pass would obviously benefit from PDO as it utilizes block - frequencies. It would also benefit from recomputing frequencies - if profile data is not available since frequencies often get out - of sync with reality. */ + If the resulting block is in a shallower loop nest, then use it. */ static basic_block select_best_block (basic_block early_bb, @@ -195,18 +187,17 @@ select_best_block (basic_block early_bb, { basic_block best_bb = late_bb; basic_block temp_bb = late_bb; - int threshold; while (temp_bb != early_bb) { + /* Walk up the dominator tree, hopefully we'll find a shallower +loop nest. */ + temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb); + /* If we've moved into a lower loop nest, then that becomes our best block. */ if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb)) best_bb = temp_bb; - - /* Walk up the dominator tree, hopefully we'll find a shallower -loop nest. */ - temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb); } /* Placing a statement before a setjmp-like function would be invalid @@ -221,6 +212,16 @@ select_best_block (basic_block early_bb, if (bb_loop_depth (best_bb) < bb_loop_depth (early_bb)) return best_bb; + /* Do not move stmts to post-dominating places on the same loop depth. */ + if (dominated_by_p (CDI_POST_DOMINATORS, early_bb, best_bb)) +return early_bb; + + /* If the latch block is empty, don't make it non-empty by sinking + something into it. */ + if (best_bb == early_bb->loop_father->latch + && empty_block_p (best_bb)) +return early_bb; + /* Avoid turning an unconditional read into a conditional one when we still might want to perform vectorization. */ if (best_bb->loop_father == early_bb->loop_father @@ -233,28 +234,7 @@ se
RE: [PATCH] tree-optimization/114589 - remove profile based sink heuristics
On Fri, 14 Jun 2024, Li, Pan2 wrote: > Hi Richard, > > Here is one PR related to this patch (by git bisect), details as below. > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115458 > > I am still trying to narrow down which change caused this failure or any > hints here? It definitely looks like a latent issue being triggered. Either in LRA or in how the target presents itself. Richard. > Pan > > -Original Message- > From: Richard Biener > Sent: Wednesday, May 15, 2024 5:39 PM > To: gcc-patches@gcc.gnu.org > Subject: [PATCH] tree-optimization/114589 - remove profile based sink > heuristics > > The following removes the profile based heuristic limiting sinking > and instead uses post-dominators to avoid sinking to places that > are executed under the same conditions as the earlier location which > the profile based heuristic should have guaranteed as well. > > To avoid regressing this moves the empty-latch check to cover all > sink cases. > > It also stream-lines the resulting select_best_block a bit but avoids > adjusting heuristics more with this change. gfortran.dg/streamio_9.f90 > starts execute failing with this on x86_64 with -m32 because the > (float)i * 9....e-7 compute is sunk across a STOP causing it > to be no longer spilled and thus the compare failing due to excess > precision. The patch adds -ffloat-store to avoid this, following > other similar testcases. > > This change doesn't fix the testcase in the PR on itself. > > Bootstrapped on x86_64-unknown-linux-gnu, re-testing in progress. > > PR tree-optimization/114589 > * tree-ssa-sink.cc (select_best_block): Remove profile-based > heuristics. Instead reject sink locations that sink > to post-dominators. Move empty latch check here from > statement_sink_location. Also consider early_bb for the > loop depth check. > (statement_sink_location): Remove superfluous check. Remove > empty latch check. > (pass_sink_code::execute): Compute/release post-dominators. > > * gfortran.dg/streamio_9.f90: Use -ffloat-store to avoid > excess precision when not spilling. > --- > gcc/testsuite/gfortran.dg/streamio_9.f90 | 1 + > gcc/tree-ssa-sink.cc | 62 > 2 files changed, 20 insertions(+), 43 deletions(-) > > diff --git a/gcc/testsuite/gfortran.dg/streamio_9.f90 > b/gcc/testsuite/gfortran.dg/streamio_9.f90 > index b6bddb973f8..f29ded6ba54 100644 > --- a/gcc/testsuite/gfortran.dg/streamio_9.f90 > +++ b/gcc/testsuite/gfortran.dg/streamio_9.f90 > @@ -1,4 +1,5 @@ > ! { dg-do run } > +! { dg-options "-ffloat-store" } > ! PR29053 Stream IO test 9. > ! Contributed by Jerry DeLisle . > ! Test case derived from that given in PR by Steve Kargl. > diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc > index 2f90acb7ef4..2188b7523c7 100644 > --- a/gcc/tree-ssa-sink.cc > +++ b/gcc/tree-ssa-sink.cc > @@ -178,15 +178,7 @@ nearest_common_dominator_of_uses (def_operand_p def_p, > bool *debug_stmts) > > We want the most control dependent block in the shallowest loop nest. > > - If the resulting block is in a shallower loop nest, then use it. Else > - only use the resulting block if it has significantly lower execution > - frequency than EARLY_BB to avoid gratuitous statement movement. We > - consider statements with VOPS more desirable to move. > - > - This pass would obviously benefit from PDO as it utilizes block > - frequencies. It would also benefit from recomputing frequencies > - if profile data is not available since frequencies often get out > - of sync with reality. */ > + If the resulting block is in a shallower loop nest, then use it. */ > > static basic_block > select_best_block (basic_block early_bb, > @@ -195,18 +187,17 @@ select_best_block (basic_block early_bb, > { >basic_block best_bb = late_bb; >basic_block temp_bb = late_bb; > - int threshold; > >while (temp_bb != early_bb) > { > + /* Walk up the dominator tree, hopefully we'll find a shallower > + loop nest. */ > + temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb); > + >/* If we've moved into a lower loop nest, then that becomes >our best block. */ >if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb)) > best_bb = temp_bb; > - > - /* Walk up the dominator tree, hopefully we'll find a shallower > - loop nest. */ > - temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb); > } > >/* Placing a statement before a setjmp-like function would be invalid > @@ -221,6 +212,16 @@ select_best_block (basic_block early_bb, >if (bb_loop_depth (best_bb) < bb_loop_depth (early_bb)) > return best_bb; > > + /* Do not move stmts to post-dominating places on the same loop depth. */ > + if (dominated_by_p (CDI_POST_DOMINATORS, early_bb, best_bb)) > +return early_bb; > + > + /* If the latch block is empty, do
RE: [PATCH] tree-optimization/114589 - remove profile based sink heuristics
> It definitely looks like a latent issue being triggered. Either in LRA > or in how the target presents itself. Thanks Richard, will have a try and keep you posted. Pan -Original Message- From: Richard Biener Sent: Friday, June 14, 2024 9:11 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org Subject: RE: [PATCH] tree-optimization/114589 - remove profile based sink heuristics On Fri, 14 Jun 2024, Li, Pan2 wrote: > Hi Richard, > > Here is one PR related to this patch (by git bisect), details as below. > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115458 > > I am still trying to narrow down which change caused this failure or any > hints here? It definitely looks like a latent issue being triggered. Either in LRA or in how the target presents itself. Richard. > Pan > > -Original Message- > From: Richard Biener > Sent: Wednesday, May 15, 2024 5:39 PM > To: gcc-patches@gcc.gnu.org > Subject: [PATCH] tree-optimization/114589 - remove profile based sink > heuristics > > The following removes the profile based heuristic limiting sinking > and instead uses post-dominators to avoid sinking to places that > are executed under the same conditions as the earlier location which > the profile based heuristic should have guaranteed as well. > > To avoid regressing this moves the empty-latch check to cover all > sink cases. > > It also stream-lines the resulting select_best_block a bit but avoids > adjusting heuristics more with this change. gfortran.dg/streamio_9.f90 > starts execute failing with this on x86_64 with -m32 because the > (float)i * 9....e-7 compute is sunk across a STOP causing it > to be no longer spilled and thus the compare failing due to excess > precision. The patch adds -ffloat-store to avoid this, following > other similar testcases. > > This change doesn't fix the testcase in the PR on itself. > > Bootstrapped on x86_64-unknown-linux-gnu, re-testing in progress. > > PR tree-optimization/114589 > * tree-ssa-sink.cc (select_best_block): Remove profile-based > heuristics. Instead reject sink locations that sink > to post-dominators. Move empty latch check here from > statement_sink_location. Also consider early_bb for the > loop depth check. > (statement_sink_location): Remove superfluous check. Remove > empty latch check. > (pass_sink_code::execute): Compute/release post-dominators. > > * gfortran.dg/streamio_9.f90: Use -ffloat-store to avoid > excess precision when not spilling. > --- > gcc/testsuite/gfortran.dg/streamio_9.f90 | 1 + > gcc/tree-ssa-sink.cc | 62 > 2 files changed, 20 insertions(+), 43 deletions(-) > > diff --git a/gcc/testsuite/gfortran.dg/streamio_9.f90 > b/gcc/testsuite/gfortran.dg/streamio_9.f90 > index b6bddb973f8..f29ded6ba54 100644 > --- a/gcc/testsuite/gfortran.dg/streamio_9.f90 > +++ b/gcc/testsuite/gfortran.dg/streamio_9.f90 > @@ -1,4 +1,5 @@ > ! { dg-do run } > +! { dg-options "-ffloat-store" } > ! PR29053 Stream IO test 9. > ! Contributed by Jerry DeLisle . > ! Test case derived from that given in PR by Steve Kargl. > diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc > index 2f90acb7ef4..2188b7523c7 100644 > --- a/gcc/tree-ssa-sink.cc > +++ b/gcc/tree-ssa-sink.cc > @@ -178,15 +178,7 @@ nearest_common_dominator_of_uses (def_operand_p def_p, > bool *debug_stmts) > > We want the most control dependent block in the shallowest loop nest. > > - If the resulting block is in a shallower loop nest, then use it. Else > - only use the resulting block if it has significantly lower execution > - frequency than EARLY_BB to avoid gratuitous statement movement. We > - consider statements with VOPS more desirable to move. > - > - This pass would obviously benefit from PDO as it utilizes block > - frequencies. It would also benefit from recomputing frequencies > - if profile data is not available since frequencies often get out > - of sync with reality. */ > + If the resulting block is in a shallower loop nest, then use it. */ > > static basic_block > select_best_block (basic_block early_bb, > @@ -195,18 +187,17 @@ select_best_block (basic_block early_bb, > { >basic_block best_bb = late_bb; >basic_block temp_bb = late_bb; > - int threshold; > >while (temp_bb != early_bb) > { > + /* Walk up the dominator tree, hopefully we'll find a shallower > + loop nest. */ > + temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb); > + >/* If we've moved into a lower loop nest, then that becomes >our best block. */ >if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb)) > best_bb = temp_bb; > - > - /* Walk up the dominator tree, hopefully we'll find a shallower > - loop nest. */ > - temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb); > } > >/* Placing a statement before a setjmp-like fu
Re: [PATCH 2/3] Enabled LRA for ia64.
On Fri, 14 Jun 2024 at 14:07, Frank Scheiner wrote: > > On 14.06.24 14:53, Jonathan Wakely wrote: > > On Fri, 14 Jun 2024 at 12:07, Frank Scheiner wrote: > >> ...point to two headers which are part of glibc 2.39 (w/ia64 support > >> re-added): > >> > >> * /usr/include/bits/sigcontext.h:32-38: > >> ``` > >> 32 struct __ia64_fpreg > >> 33 { > >> 34 union > >> 35 { > >> 36 unsigned long bits[2]; > >> 37 } u; > >> 38 } __attribute__ ((__aligned__ (16))); > >> ``` > >> > >> * /usr/include/sys/ucontext.h:39-45: > >> ``` > >>39 struct __ia64_fpreg_mcontext > >>40 { > >>41 union > >>42 { > >>43 unsigned long __ctx(bits)[2]; > >>44 } __ctx(u); > >>45 } __attribute__ ((__aligned__ (16))); > >> ``` > > > > This is an ia64-specific glibc bug, because it means the following > > valid C program is rejected: > > > > #define _GNU_SOURCE > > #define bits 111 > > #define u no u > > #include > > int main() { } > > Ok, I see. Should this then be "worked around" like in [1] with > something like: > > ``` > #if defined (__linux__) && defined (__ia64__) > // defines __ia64_fpreg_mcontext::u > #undef u > #endif > ``` > > ...in `libstdc++-v3/testsuite/17_intro/names.cc` or should we fix it in > glibc instead? Both, ideally. The libstdc++ test should definitely be fixed because it fails with released versions of glibc already in the wild. But glibc should also be fixed because it's a standards conformance issue. > > [1]: > https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=cf5f7791056b3ed993bc8024be767a86157514a9 > > Cheers, > Frank >
Re: [PING^2] Re: [PATCH v7 1/9] Improve must tail in RTL backend
Andi Kleen writes: PING^2 > Need reviewers for the tree and middle-end parts, as well as the C frontend. > > Thanks! > > -Andi
Re: [PATCH v4] c++: fix constained auto deduction in templ spec scopes [PR114915]
On Wed, 2024-05-22 at 16:30 -0400, Jason Merrill wrote: > OK, on the right patch this time I hope. > > Looks like you still need either FSF copyright assignment or DCO > certification per https://gcc.gnu.org/contribute.html#legal > Hi. Thanks for your patience. I now have the FSF copyright assignment. > On 5/15/24 13:27, Seyed Sajad Kahani wrote: > > This patch resolves PR114915 by replacing the logic that fills in > > the > > missing levels in do_auto_deduction in cp/pt.cc. > > I miss the text in your original patch that explained the problem > more. > The patch that I will send in the upcoming email contains the explanation from the v1 patch as well. > > The new approach now trims targs if the depth of targs is deeper > > than desired > > (this will only happen in specific contexts), and still fills targs > > with empty > > layers if it has fewer depths than expected. > > > > PR c++/114915 > > This line needs to start with a tab. > > > gcc/cp/ChangeLog: > > > > * pt.cc (do_auto_deduction): Handle excess outer template > > arguments during constrained auto satisfaction. > > This one, too. These issues are flagged by git gcc-verify, and are > easier to avoid with git gcc-commit-mklog. > Thanks for guiding me. The commit is now verified. > > gcc/testsuite/ChangeLog: > > > > * g++.dg/cpp2a/concepts-placeholder14.C: New test. > > * g++.dg/cpp2a/concepts-placeholder15.C: New test. > > This test still needs a variable template partial specialization. > Regarding this, I have added a simple variable template parameter specialization to the test, but due to PR c++/115030, which I will be working on right after this patch, a more complex test will fail. > A few coding style nits below. > > > * g++.dg/cpp2a/concepts-placeholder16.C: New test. > > --- > > gcc/cp/pt.cc | 20 --- > > .../g++.dg/cpp2a/concepts-placeholder14.C | 19 +++ > > .../g++.dg/cpp2a/concepts-placeholder15.C | 15 + > > .../g++.dg/cpp2a/concepts-placeholder16.C | 33 > > +++ > > 4 files changed, 83 insertions(+), 4 deletions(-) > > create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts- > > placeholder14.C > > create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts- > > placeholder15.C > > create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts- > > placeholder16.C > > > > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc > > index 32640f8e9..ecfda67aa 100644 > > --- a/gcc/cp/pt.cc > > +++ b/gcc/cp/pt.cc > > @@ -31253,6 +31253,19 @@ do_auto_deduction (tree type, tree init, > > tree auto_node, > > full_targs = add_outermost_template_args (tmpl, > > full_targs); > > full_targs = add_to_template_args (full_targs, targs); > > > > + int want = TEMPLATE_TYPE_ORIG_LEVEL (auto_node); > > + int have = TMPL_ARGS_DEPTH (full_targs); > > + > > + if (want < have) > > + { > > + // if a constrained auto is declared in an explicit > > specialization > > We generally use C-style /* */ comments, that start with a capital > letter and end with a period. > > > + gcc_assert (context == adc_variable_type || context == > > adc_return_type > > + || context == adc_decomp_type); > > The || should line up with the 'c' on the previous line. > > > + tree trimmed_full_args = get_innermost_template_args > > + (full_targs, want); > > We try to avoid having arguments to the left of the function name; > here > I'd start the new line with the = instead. > > > + full_targs = trimmed_full_args; > > + } > > + > > Unnecessary tab on this line. > > > /* HACK: Compensate for callers not always communicating > > all levels of > > outer template arguments by filling in the outermost > > missing levels > > with dummy levels before checking satisfaction. We'll > > still crash > > @@ -31260,11 +31273,10 @@ do_auto_deduction (tree type, tree init, > > tree auto_node, > > these missing levels, but this hack otherwise allows us > > to handle a > > large subset of possible constraints (including all non- > > dependent > > constraints). */ > > - if (int missing_levels = (TEMPLATE_TYPE_ORIG_LEVEL > > (auto_node) > > - - TMPL_ARGS_DEPTH (full_targs))) > > + if (want > have) > > { > > - tree dummy_levels = make_tree_vec (missing_levels); > > - for (int i = 0; i < missing_levels; ++i) > > + tree dummy_levels = make_tree_vec (want - have); > > + for (int i = 0; i < want - have; ++i) > > TREE_VEC_ELT (dummy_levels, i) = make_tree_vec (0); > > full_targs = add_to_template_args (dummy_levels, > > full_targs); > > } > > diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-placeholder14.C > > b/gcc/testsuite/g++.dg/cpp2a/concepts-placeholder14.C > > new file mode 100644 > > index 0..fcdbd7608 > > --- /dev/null > > +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-placeholder14.C > > @@ -0,0 +1
[PATCH v5] c++: fix constained auto deduction in templ spec scopes [PR114915]
When deducing auto for `adc_return_type`, `adc_variable_type`, and `adc_decomp_type` contexts (at the usage time), we try to resolve the outermost template arguments to be used for satisfaction. This is done by one of the following, depending on the scope: 1. Checking the `DECL_TEMPLATE_INFO` of the current function scope and extracting DECL_TI_ARGS from it for function scope deductions (pt.cc:31236). 2. Checking the `DECL_TEMPLATE_INFO` of the declaration (alongside with other conditions) for non-function scope variable declaration deductions (decl.cc:8527). Then, we do not retrieve the deeper layers of the template arguments; instead, we fill the missing levels with dummy levels (pt.cc:31260). The problem (that is shown in PR114915) is that we do not consider the case where the deduction happens in a template specialization scope. In this case, the type is not dependent on the outermost template arguments (which are the specialization arguments). Yet, we still resolve the outermost template arguments, and then the number of layers in the template arguments exceeds the number of levels in the type. This causes the missing levels to be negative. This leads to the rejection of valid code and ICEs (like segfault) in the release mode. In the debug mode, it is possible to show as an assertion failure (when creating a tree_vec with a negative size). This patch resolves PR114915 by replacing the logic that fills in the missing levels in do_auto_deduction in cp/pt.cc. The new approach now trims targs if the depth of targs is deeper than desired (this will only happen in specific contexts), and still fills targs with empty layers if it has fewer depths than expected. PR c++/114915 gcc/cp/ChangeLog: * pt.cc (do_auto_deduction): Handle excess outer template arguments during constrained auto satisfaction. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-placeholder14.C: New test. * g++.dg/cpp2a/concepts-placeholder15.C: New test. * g++.dg/cpp2a/concepts-placeholder16.C: New test. --- gcc/cp/pt.cc | 20 --- .../g++.dg/cpp2a/concepts-placeholder14.C | 19 +++ .../g++.dg/cpp2a/concepts-placeholder15.C | 26 +++ .../g++.dg/cpp2a/concepts-placeholder16.C | 33 +++ 4 files changed, 94 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-placeholder14.C create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-placeholder15.C create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-placeholder16.C diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc index 32640f8e9..2206d9ffe 100644 --- a/gcc/cp/pt.cc +++ b/gcc/cp/pt.cc @@ -31253,6 +31253,19 @@ do_auto_deduction (tree type, tree init, tree auto_node, full_targs = add_outermost_template_args (tmpl, full_targs); full_targs = add_to_template_args (full_targs, targs); + int want = TEMPLATE_TYPE_ORIG_LEVEL (auto_node); + int have = TMPL_ARGS_DEPTH (full_targs); + + if (want < have) + { + /* If a constrained auto is declared in an explicit specialization. */ + gcc_assert (context == adc_variable_type || context == adc_return_type + || context == adc_decomp_type); + tree trimmed_full_args + = get_innermost_template_args (full_targs, want); + full_targs = trimmed_full_args; + } + /* HACK: Compensate for callers not always communicating all levels of outer template arguments by filling in the outermost missing levels with dummy levels before checking satisfaction. We'll still crash @@ -31260,11 +31273,10 @@ do_auto_deduction (tree type, tree init, tree auto_node, these missing levels, but this hack otherwise allows us to handle a large subset of possible constraints (including all non-dependent constraints). */ - if (int missing_levels = (TEMPLATE_TYPE_ORIG_LEVEL (auto_node) - - TMPL_ARGS_DEPTH (full_targs))) + if (want > have) { - tree dummy_levels = make_tree_vec (missing_levels); - for (int i = 0; i < missing_levels; ++i) + tree dummy_levels = make_tree_vec (want - have); + for (int i = 0; i < want - have; ++i) TREE_VEC_ELT (dummy_levels, i) = make_tree_vec (0); full_targs = add_to_template_args (dummy_levels, full_targs); } diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-placeholder14.C b/gcc/testsuite/g++.dg/cpp2a/concepts-placeholder14.C new file mode 100644 index 0..fcdbd7608 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-placeholder14.C @@ -0,0 +1,19 @@ +// PR c++/114915 +// { dg-do compile { target c++20 } } + +template +concept C = __is_same(T, int); + +template +void f() { +} + +template<> +void f() { + C auto x = 1; +} + +int main() { + f(); + return 0; +} diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-placeholde
RE: [PATCH v1] Match: Support more forms for the scalar unsigned .SAT_SUB
Committed with those changes and test suites passed. Pan -Original Message- From: Li, Pan2 Sent: Friday, June 14, 2024 4:15 PM To: Richard Biener Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: RE: [PATCH v1] Match: Support more forms for the scalar unsigned .SAT_SUB Thanks Richard for comments. > :c shouldn't be necessary on the plus > or on the bit_xor > OK with those changes. Will remove the :c and commit it if there is no surprise from test suites. Pan -Original Message- From: Richard Biener Sent: Friday, June 14, 2024 4:05 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] Match: Support more forms for the scalar unsigned .SAT_SUB On Wed, Jun 12, 2024 at 2:38 PM wrote: > > From: Pan Li > > After we support the scalar unsigned form 1 and 2, we would like > to introduce more forms include the branch and branchless. There > are forms 3-10 list as below: > > Form 3: > #define SAT_SUB_U_3(T) \ > T sat_sub_u_3_##T (T x, T y) \ > { \ > return x > y ? x - y : 0; \ > } > > Form 4: > #define SAT_SUB_U_4(T) \ > T sat_sub_u_4_##T (T x, T y) \ > { \ > return x >= y ? x - y : 0; \ > } > > Form 5: > #define SAT_SUB_U_5(T) \ > T sat_sub_u_5_##T (T x, T y) \ > { \ > return x < y ? 0 : x - y; \ > } > > Form 6: > #define SAT_SUB_U_6(T) \ > T sat_sub_u_6_##T (T x, T y) \ > { \ > return x <= y ? 0 : x - y; \ > } > > Form 7: > #define SAT_SUB_U_7(T) \ > T sat_sub_u_7_##T (T x, T y) \ > { \ > T ret; \ > T overflow = __builtin_sub_overflow (x, y, &ret); \ > return ret & (T)(overflow - 1); \ > } > > Form 8: > #define SAT_SUB_U_8(T) \ > T sat_sub_u_8_##T (T x, T y) \ > { \ > T ret; \ > T overflow = __builtin_sub_overflow (x, y, &ret); \ > return ret & (T)-(!overflow); \ > } > > Form 9: > #define SAT_SUB_U_9(T) \ > T sat_sub_u_9_##T (T x, T y) \ > { \ > T ret; \ > T overflow = __builtin_sub_overflow (x, y, &ret); \ > return overflow ? 0 : ret; \ > } > > Form 10: > #define SAT_SUB_U_10(T) \ > T sat_sub_u_10_##T (T x, T y) \ > { \ > T ret; \ > T overflow = __builtin_sub_overflow (x, y, &ret); \ > return !overflow ? ret : 0; \ > } > > Take form 10 as example: > > SAT_SUB_U_10(uint64_t); > > Before this patch: > uint8_t sat_sub_u_10_uint8_t (uint8_t x, uint8_t y) > { > unsigned char _1; > unsigned char _2; > uint8_t _3; > __complex__ unsigned char _6; > > ;; basic block 2, loop depth 0 > ;;pred: ENTRY > _6 = .SUB_OVERFLOW (x_4(D), y_5(D)); > _2 = IMAGPART_EXPR <_6>; > if (_2 == 0) > goto ; [50.00%] > else > goto ; [50.00%] > ;;succ: 3 > ;;4 > > ;; basic block 3, loop depth 0 > ;;pred: 2 > _1 = REALPART_EXPR <_6>; > ;;succ: 4 > > ;; basic block 4, loop depth 0 > ;;pred: 2 > ;;3 > # _3 = PHI <0(2), _1(3)> > return _3; > ;;succ: EXIT > > } > > After this patch: > uint8_t sat_sub_u_10_uint8_t (uint8_t x, uint8_t y) > { > uint8_t _3; > > ;; basic block 2, loop depth 0 > ;;pred: ENTRY > _3 = .SAT_SUB (x_4(D), y_5(D)); [tail call] > return _3; > ;;succ: EXIT > > } > > The below test suites are passed for this patch: > 1. The rv64gcv fully regression test with newlib. > 2. The rv64gcv build with glibc. > 3. The x86 bootstrap test. > 4. The x86 fully regression test. > > gcc/ChangeLog: > > * match.pd: Add more match for unsigned sat_sub. > * tree-ssa-math-opts.cc (match_unsigned_saturation_sub): Add new > func impl to match phi node for .SAT_SUB. > (math_opts_dom_walker::after_dom_children): Try match .SAT_SUB > for the phi node, MULT_EXPR, BIT_XOR_EXPR and BIT_AND_EXPR. > > Signed-off-by: Pan Li > --- > gcc/match.pd | 25 +++-- > gcc/tree-ssa-math-opts.cc | 33 + > 2 files changed, 56 insertions(+), 2 deletions(-) > > diff --git a/gcc/match.pd b/gcc/match.pd > index 5cfe81e80b3..66e411b3359 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3140,14 +3140,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > /* Unsigned saturation sub, case 1 (branch with gt): > SAT_U_SUB = X > Y ? X - Y : 0 */ > (match (unsigned_integer_sat_sub @0 @1) > - (cond (gt @0 @1) (minus @0 @1) integer_zerop) > + (cond^ (gt @0 @1) (minus @0 @1) integer_zerop) > (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) >&& types_match (type, @0, @1 > > /* Unsigned saturation sub, case 2 (branch with ge): > SAT_U_SUB = X >= Y ? X - Y : 0. */ > (match (unsigned_integer_sat_sub @0 @1) > - (cond (ge @0 @1) (minus @0 @1) integer_zerop) > + (cond^ (ge @0 @1) (minus @0 @1) integer_zerop) > (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) >&& typ
Re: [PATCH] analyzer: Fix g++ 4.8 bootstrap without using std::move to return std::unique_ptr
On Fri, 2024-06-14 at 13:57 +0100, Jonathan Wakely wrote: > This builds successfully using both gcc-4.8.5 and gcc-13.3.1 as host > compiler. > > OK for trunk if testing succeeds? Yes, thanks! Dave
[committed] libstdc++: Fix declaration of posix_memalign for freestanding
Tested x86_64-linux with --disable-hosted-libstdcxx where it is needed to fix the build. Pushed to trunk, backports needed too. -- >8 -- Thanks to Jérôme Duval for noticing this. libstdc++-v3/ChangeLog: * libsupc++/new_opa.cc [!_GLIBCXX_HOSTED]: Fix declaration of posix_memalign. --- libstdc++-v3/libsupc++/new_opa.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libstdc++-v3/libsupc++/new_opa.cc b/libstdc++-v3/libsupc++/new_opa.cc index 35606e1c1b3..c7c7e794623 100644 --- a/libstdc++-v3/libsupc++/new_opa.cc +++ b/libstdc++-v3/libsupc++/new_opa.cc @@ -47,7 +47,7 @@ using std::size_t; extern "C" { # if _GLIBCXX_HAVE_POSIX_MEMALIGN - void *posix_memalign(void **, size_t alignment, size_t size); + int posix_memalign(void **, size_t alignment, size_t size); # elif _GLIBCXX_HAVE_ALIGNED_ALLOC void *aligned_alloc(size_t alignment, size_t size); # elif _GLIBCXX_HAVE__ALIGNED_MALLOC -- 2.45.1
[Fortran, Patch, PR 96992] Fix Class arrays of different ranks are rejected as storage association argument
Hi all, I somehow got assigned to this PR so I fixed it. GFortran was ICEing because of the ASSUME_RANK in a derived to class conversion. After fixing this, storage association was producing segfaults. The "shape conversion" of the class array as dummy argument was not initializing the dim 0 stride and with that grabbing into the memory somewhere. This is now fixed and regtests fine on x86_64 Fedora 39. Ok for mainline? I assume this patch could be fixing some other PRs with class array's parameter passing, too. If that sounds familiar, feel free to point me to them. Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de From 86ac3179e1314ca1c41f52025c5a156ad7346dc1 Mon Sep 17 00:00:00 2001 From: Andre Vehreschild Date: Fri, 14 Jun 2024 16:54:37 +0200 Subject: [PATCH] Fortran: [PR96992] Fix rejecting class arrays of different ranks as storage association argument. Removing the assert in trans-expr, lead to initial strides not set which is not fixed. PR fortran/96992 gcc/fortran/ChangeLog: * trans-array.cc (gfc_trans_array_bounds): Set a starting stride, when descriptor expects a variable for the stride. (gfc_trans_dummy_array_bias): Allow storage association for dummy class arrays, when they are not elemental. * trans-expr.cc (gfc_conv_derived_to_class): Remove assert to allow converting derived to class type arrays with assumend rank. gcc/testsuite/ChangeLog: * gfortran.dg/pr96992.f90: New test. --- gcc/fortran/trans-array.cc| 7 ++- gcc/fortran/trans-expr.cc | 2 - gcc/testsuite/gfortran.dg/pr96992.f90 | 61 +++ 3 files changed, 67 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/pr96992.f90 diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc index b3088a892c8..9fa8bad2f35 100644 --- a/gcc/fortran/trans-array.cc +++ b/gcc/fortran/trans-array.cc @@ -6798,6 +6798,9 @@ gfc_trans_array_bounds (tree type, gfc_symbol * sym, tree * poffset, size = gfc_index_one_node; offset = gfc_index_zero_node; + stride = GFC_TYPE_ARRAY_STRIDE (type, 0); + if (stride && VAR_P (stride)) +gfc_add_modify (pblock, stride, gfc_index_one_node); for (dim = 0; dim < as->rank; dim++) { /* Evaluate non-constant array bound expressions. @@ -7134,7 +7137,9 @@ gfc_trans_dummy_array_bias (gfc_symbol * sym, tree tmpdesc, || (is_classarray && CLASS_DATA (sym)->attr.allocatable)) return; - if (!is_classarray && sym->attr.dummy && gfc_is_nodesc_array (sym)) + if ((!is_classarray + || (is_classarray && CLASS_DATA (sym)->as->type == AS_EXPLICIT)) + && sym->attr.dummy && !sym->attr.elemental && gfc_is_nodesc_array (sym)) { gfc_trans_g77_array (sym, block); return; diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc index 0796fb75505..4bb62cfb1ad 100644 --- a/gcc/fortran/trans-expr.cc +++ b/gcc/fortran/trans-expr.cc @@ -903,8 +903,6 @@ gfc_conv_derived_to_class (gfc_se *parmse, gfc_expr *e, if (e->rank != class_ts.u.derived->components->as->rank) { - gcc_assert (class_ts.u.derived->components->as->type - == AS_ASSUMED_RANK); if (derived_array && GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (parmse->expr))) { diff --git a/gcc/testsuite/gfortran.dg/pr96992.f90 b/gcc/testsuite/gfortran.dg/pr96992.f90 new file mode 100644 index 000..c56ed80f394 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/pr96992.f90 @@ -0,0 +1,61 @@ +! { dg-do run } + +! PR fortran/96992 + +! Contributed by Thomas Koenig + +! From the standard: +! An actual argument that represents an element sequence and +! corresponds to a dummy argument that is an array is sequence +! associated with the dummy argument. The rank and shape of the +! actual argument need not agree with the rank and shape of the +! dummy argument, but the number of elements in the dummy argument +! shall not exceed the number of elements in the element sequence +! of the actual argument. If the dummy argument is assumed-size, +! the number of elements in the dummy argument is exactly +! the number of elements in the element sequence. + +! Check that walking the sequence starts with an initialized stride +! for dim == 0. + +module foo_mod + implicit none + type foo + integer :: i + end type foo +contains + subroutine d1(x,n) +integer, intent(in) :: n +integer :: i +class (foo), intent(out), dimension(n) :: x +select type(x) +class is(foo) + x(:)%i = (/ (42 + i, i = 1, n ) /) +class default + stop 1 +end select + end subroutine d1 + subroutine d2(x,n) +integer, intent(in) :: n +integer :: i +class (foo), intent(in), dimension(n,n,n) :: x +select type (x) +class is (foo) + print *,x%i + if ( any( x%i /= reshape((/ (42 + i, i = 1, n ** 3 ) /), [n, n, n] ))) stop 2 +class default + stop 3 +end select + end subroutine d2 +end module foo_mod +program main + use foo_mod + i
Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab
On Fri, Jun 14, 2024 at 6:31 PM Richard Biener wrote: > > The following retires vcond{,u,eq} optabs by stopping to use them > from the middle-end. Targets instead (should) implement vcond_mask > and vec_cmp{,u,eq} optabs. The PR this change refers to lists > possibly affected targets - those implementing these patterns, > and in particular it lists mips, sparc and ia64 as targets that > most definitely will regress while others might simply remove > their vcond{,u,eq} patterns. > > I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64. > I know riscv doesn't implement any of the legacy optabs. But less > maintained vector targets might need adjustments. > At GCC14, I tried to remove these expanders in the x86 backend, and it regressed some testcases, mainly because of the optimizations we did in ix86_expand_{int,fp}_vcond. I've started testing your patch, it's possible that we still need to move the ix86_expand_{int,fp}_vcond optimizations to the middle-end(isel or match.pd)or add extra patterns to handle it at the rtl pas_combine. -- BR, Hongtao
[committed] libstdc++: Make std::type_info::operator== always_inline for C++23 [PR110572]
Tested x86_64-linux, and lightly tested on x86_64-w64-mingw32 too, where it fixes the linker error. Pushed to trunk. Backports needed for 12, 13 and 14. -- >8 -- Commit r12-6266-g3633cc54284450 implemented P1328 for C++23, making std::type_info::operator== usable in constant expressions. For targets such as mingw-w64 where that function was not previously inline, making it constexpr required making it inline for C++23 and later. For statically linked programs this can result in multiple definition errors, because there's a non-inline definition in libstdc++.a as well. For those targets make it always_inline for C++23, so that there is no symbol generated for the inline definition, and the non-inline definition in libstdc++.a will be the only definition. libstdc++-v3/ChangeLog: PR libstdc++/110572 * libsupc++/typeinfo (type_info::operator==): Add always_inline attribute for targets where the ABI requries equality to be non-inline. * testsuite/18_support/type_info/110572.cc: New test. --- libstdc++-v3/libsupc++/typeinfo | 3 +++ libstdc++-v3/testsuite/18_support/type_info/110572.cc | 11 +++ 2 files changed, 14 insertions(+) create mode 100644 libstdc++-v3/testsuite/18_support/type_info/110572.cc diff --git a/libstdc++-v3/libsupc++/typeinfo b/libstdc++-v3/libsupc++/typeinfo index fcc3077d060..35e72bb18ee 100644 --- a/libstdc++-v3/libsupc++/typeinfo +++ b/libstdc++-v3/libsupc++/typeinfo @@ -188,6 +188,9 @@ namespace std #endif #if __GXX_TYPEINFO_EQUALITY_INLINE || __cplusplus > 202002L +# if ! __GXX_TYPEINFO_EQUALITY_INLINE + [[__gnu__::__always_inline__]] +# endif _GLIBCXX23_CONSTEXPR inline bool type_info::operator==(const type_info& __arg) const _GLIBCXX_NOEXCEPT { diff --git a/libstdc++-v3/testsuite/18_support/type_info/110572.cc b/libstdc++-v3/testsuite/18_support/type_info/110572.cc new file mode 100644 index 000..64081879b77 --- /dev/null +++ b/libstdc++-v3/testsuite/18_support/type_info/110572.cc @@ -0,0 +1,11 @@ +// { dg-options "-static-libstdc++" } +// { dg-require-static-libstdcxx } +// { dg-require-cpp-feature-test __cpp_rtti } +// { dg-do link } + +#include + +int main() +{ + return typeid(0) == typeid(0u); +} -- 2.45.1
Re: [PATCH] Enhance if-conversion for automatic arrays
On Fri, Jun 14, 2024 at 5:54 AM Richard Biener wrote: > > Automatic arrays that are not address-taken should not be subject to > store data races. That seems conservative enough. Though I would think if the array never escaped the function would be still correct and allow for more arrays (but maybe that is not tracked). Thanks, Andrew Pinski > This applies to OMP SIMD in-branch lowered > functions result array which for the testcase otherwise prevents > vectorization with SSE and for AVX and AVX512 ends up with spurious > .MASK_STORE to the stack surviving. > > This inefficiency was noted in PR111793. > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. Is my > idea of store data races correct? At least phiopt uses the same > check but for example LIM doesn't special-case locals. > > PR tree-optimization/111793 > * tree-if-conv.cc (ifcvt_memrefs_wont_trap): For stores > that do not trap only consider -fstore-data-races when > the underlying object is not automatic or has its address > taken. > > * gcc.dg/vect/vect-simd-clone-21.c: New testcase. > --- > gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c | 16 > gcc/tree-if-conv.cc| 13 +++-- > 2 files changed, 27 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c > b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c > new file mode 100644 > index 000..49c52fb59bd > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_simd_clones } */ > +/* { dg-additional-options "-fopenmp-simd" } */ > + > +#pragma omp declare simd simdlen(4) inbranch > +__attribute__((noinline)) int > +foo (int a, int b) > +{ > + return a + b; > +} > + > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" { target > i?86-*-* x86_64-*-* } } } */ > +/* if-conversion shouldn't need to resort to masked stores for the result > + array created by OMP lowering since that's automatic and does not have > + its address taken. */ > +/* { dg-final { scan-tree-dump-not "MASK_STORE" "vect" } } */ > diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc > index c4c3ed41a44..974c614edf3 100644 > --- a/gcc/tree-if-conv.cc > +++ b/gcc/tree-if-conv.cc > @@ -934,14 +934,23 @@ ifcvt_memrefs_wont_trap (gimple *stmt, > vec drs) >if (DR_IS_READ (a)) > return true; > > + bool ok = flag_store_data_races; > + base = get_base_address (base); > + if (DECL_P (base) > + && auto_var_in_fn_p (base, cfun->decl) > + && ! may_be_aliased (base)) > + /* Automatic variables not aliased are not subject to > + data races. */ > + ok = true; > + >/* an unconditionaly write won't trap if the base is written > to unconditionally. */ >if (base_master_dr > && DR_BASE_W_UNCONDITIONALLY (*base_master_dr)) > - return flag_store_data_races; > + return ok; >/* or the base is known to be not readonly. */ >else if (base_object_writable (DR_REF (a))) > - return flag_store_data_races; > + return ok; > } > >return false; > -- > 2.35.3
Re: [PATCH 30/52 v2] pdp11: Remove macro {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE
Ok, I understand better now. But if those macros are supposed to be replaced by hook functions, could you make that replacement part of the proposed patch? paul > On Jun 13, 2024, at 11:22 PM, Kewen.Lin wrote: > > Hi Paul, > > on 2024/6/14 04:07, Paul Koning wrote: >> What is the effect of this change? The original code intended to have >> "float" mean a 32 bit value, and "double" a 64 bit value. There aren't any >> larger floats, so I defined the long double size as 64 also. Is the right >> answer not to define it? > > Since sub-patch 09/52 will poison {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE, > target code building will fail > if it still has these macros. As I'd like to squash these target changes > onto 09/52, so I didn't note > the background/context here, sorry about that. > >> >> That part I understand, but why does the patch also remove FLOAT_TYPE_SIZE >> and DOUBLE_TYPE_SIZE without explanation and without mention in the >> changelog? > > Oops, thanks for catching! I just noticed this sub-patch has inconsistent > subject & changelog, I should > have noticed this as it has a quite different subject from the others. :( > With your finding, I just > re-visited all the other sub-patches, luckily they are consistent. > > The below is the updated revision, hope it looks good to you. Thanks again. > > BR, > Kewen > - > > Subject: [PATCH] pdp11: Remove macro {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE > > This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE > defines in pdp11 port, as we want to replace these macros > with hook mode_for_floating_type and poison them. > > gcc/ChangeLog: > >* config/pdp11/pdp11.h (FLOAT_TYPE_SIZE): Remove. >(DOUBLE_TYPE_SIZE): Likewise. >(LONG_DOUBLE_TYPE_SIZE): Likewise. > --- > gcc/config/pdp11/pdp11.h | 11 --- > 1 file changed, 11 deletions(-) > > diff --git a/gcc/config/pdp11/pdp11.h b/gcc/config/pdp11/pdp11.h > index 2446fea0b58..6c8e045bc57 100644 > --- a/gcc/config/pdp11/pdp11.h > +++ b/gcc/config/pdp11/pdp11.h > @@ -71,17 +71,6 @@ along with GCC; see the file COPYING3. If not see > #define LONG_TYPE_SIZE 32 > #define LONG_LONG_TYPE_SIZE64 > > -/* In earlier versions, FLOAT_TYPE_SIZE was selectable as 32 or 64, > - but that conflicts with Fortran language rules. Since there is no > - obvious reason why we should have that feature -- other targets > - generally don't have float and double the same size -- I've removed > - it. Note that it continues to be true (for now) that arithmetic is > - always done with 64-bit values, i.e., the FPU is always in "double" > - mode. */ > -#define FLOAT_TYPE_SIZE32 > -#define DOUBLE_TYPE_SIZE 64 > -#define LONG_DOUBLE_TYPE_SIZE 64 > - > /* machine types from ansi */ > #define SIZE_TYPE "short unsigned int" /* definition of size_t */ > #define WCHAR_TYPE "short int" /* or long int */ > -- > 2.43.0 > >
[r15-1330 Regression] FAIL: gcc.target/i386/vect-strided-3.c scan-assembler-times movq 4 on Linux/x86_64
On Linux/x86_64, 1438b15e5430f7fab3832c35d262d6b58caba469 is the first bad commit commit 1438b15e5430f7fab3832c35d262d6b58caba469 Author: Richard Biener Date: Fri Jun 14 11:31:53 2024 +0200 Adjust gcc.target/i386/vect-strided-3.c caused FAIL: gcc.target/i386/vect-strided-3.c scan-assembler-times movq 4 with GCC configured with ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-1330/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="i386.exp=gcc.target/i386/vect-strided-3.c --target_board='unix{-m32\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at haochen dot jiang at intel.com.) (If you met problems with cascadelake related, disabling AVX512F in command line might save that.) (However, please make sure that there is no potential problems with AVX512.)
Re: [PATCH V2 1/2] RISC-V: Fix vwsll combine on rv32 targets
> diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md > index 6a2eabbd854..29916adb62b 100644 > --- a/gcc/config/riscv/autovec-opt.md > +++ b/gcc/config/riscv/autovec-opt.md > @@ -1517,8 +1517,7 @@ (define_insn_and_split "*vwsll_zext1_scalar_" >"&& 1" >[(const_int 0)] >{ > -if (GET_CODE (operands[2]) == SUBREG) > - operands[2] = SUBREG_REG (operands[2]); > +operands[2] = gen_lowpart (Pmode, operands[2]); > insn_code icode = code_for_pred_vwsll_scalar (mode); > riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands); > DONE; Please also correct it in the other expander (_trunc_). OK with that change if the testsuite still looks ok :) Regards Robin
Re: [PATCH V2 2/2] RISC-V: Move mode assertion out of conditional branch in emit_insn
OK. Regards Robin
[patch,avr,applied] PR115419: round double to even LSB on tie breaks
The convention in round-to-nearest floating-point rounding is to round to even ("Banker's rounding"). Johann AVR: target/115419 - Tie breaks are rounded-to-even. libgcc/config/avr/libf7/ PR target/115419 * libf7.c (f7_get_double): Round tie breaks to even LSB. diff --git a/libgcc/config/avr/libf7/libf7.c b/libgcc/config/avr/libf7/libf7.c index 375becb854c..6fae4fc1a2d 100644 --- a/libgcc/config/avr/libf7/libf7.c +++ b/libgcc/config/avr/libf7/libf7.c @@ -440,11 +440,21 @@ f7_double_t f7_get_double (const f7_t *aa) mant &= 0x00ff; - // FIXME: For subnormals, rounding is premature and should be - // done *after* the mantissa has been shifted into place - // (or the round value be shifted left accordingly). - // Round. - mant += 1u << (F7_MANT_BITS - (1 + DBL_DIG_MANT) - 1); + // PR115419: The least significant nibble tells how to round: + // Tie breaks are rounded to even (Banker's rounding). + uint8_t lsn = mant & 0xff; + lsn &= 0xf; + // The LSB of the outgoing double is at bit 3. + if (lsn & (1 << 3)) +++lsn; + if (lsn > (1 << 2)) +{ + // FIXME: For subnormals, rounding is premature and should be + //done *after* the mantissa has been shifted into place + //(or the round value be shifted left accordingly). + // Round. + mant += 1u << (F7_MANT_BITS - (1 + DBL_DIG_MANT) - 1); +} uint8_t dex; register uint64_t r18 __asm ("r18") = mant;
Re: [PATCH] c++: remove Concepts TS code
On Wed, Jun 12, 2024 at 03:08:27PM -0400, Patrick Palka wrote: > FWIW I think we can also remove DECL_DECLARED_CONCEPT_P, > function_concept_p and variable_concept_p (and related code in > grokfndecl and grokvardecl probably). And e.g. code in constraint.cc > that checks for FUNCTION_DECL, VAR_*, CALL_EXPR, OVERLOAD / OVL_* etc. Thanks. I'm leaving this as a follow-up; the patch is already quite large. Marek
Re: [PATCH 0/3] [APX CFCMOV] Support APX CFCMOV
On Fri, 14 Jun 2024, Kong, Lingling wrote: > APX CFCMOV[1] feature implements conditionally faulting which means that all > memory faults are suppressed > when the condition code evaluates to false and load or store a memory > operand. Now we could load or store a > memory operand may trap or fault for conditional move. > > In middle-end, now we don't support a conditional move if we knew that a load > from A or B could trap or fault. Predicated loads&stores on Itanium don't trap either. They are modeled via COND_EXEC on RTL. The late if-conversion pass (the instance that runs after reload) is capable of introducing them. > To enable CFCMOV, we add a target HOOK TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP > in if-conversion pass to allow convert to cmov. Considering the above, is the new hook really necessary? Can you model the new instructions via (cond_exec () (set ...)) instead of (set (if_then_else ...)) ? Alexander
Re: [PATCH 0/3] [APX CFCMOV] Support APX CFCMOV
On 6/14/24 11:10 AM, Alexander Monakov wrote: On Fri, 14 Jun 2024, Kong, Lingling wrote: APX CFCMOV[1] feature implements conditionally faulting which means that all memory faults are suppressed when the condition code evaluates to false and load or store a memory operand. Now we could load or store a memory operand may trap or fault for conditional move. In middle-end, now we don't support a conditional move if we knew that a load from A or B could trap or fault. Predicated loads&stores on Itanium don't trap either. They are modeled via COND_EXEC on RTL. The late if-conversion pass (the instance that runs after reload) is capable of introducing them. To enable CFCMOV, we add a target HOOK TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP in if-conversion pass to allow convert to cmov. Considering the above, is the new hook really necessary? Can you model the new instructions via (cond_exec () (set ...)) instead of (set (if_then_else ...)) ? Note that turning on cond_exec will turn off some of the cmove support. But the general suggesting of trying to avoid a hook for this is a good one. In fact, my first reaction to this thread was "do we really need a hook for this". jeff
[PATCH] libcpp, c-family: Add (dumb) C23 N3017 #embed support [PR105863]
Hi! The following patch implements the C23 N3017 "#embed - a scannable, tooling-friendly binary resource inclusion mechanism" paper. The implementation is intentionally dumb, in that it doesn't significantly speed up compilation of larger initializers and doesn't make it possible to use huge #embeds (like several gigabytes large, that is compile time and memory still infeasible). There are 2 reasons for this. One is that I think like it is implemented now in the patch is how we should use it for the smaller #embed sizes, dunno with which boundary, whether 32 bytes or 64 or something like that, certainly handling the single byte cases which is something that can appear anywhere in the source where constant integer literal can appear is desirable and I think for a few bytes it isn't worth it to come up with something smarter and users would like to e.g. see it in -E readably as well (perhaps the slow vs. fast boundary should be determined by command line option). And the other one is to be able to more easily find regressions in behavior caused by the optimizations, so we have something to get back in git to compare against. I'm definitely willing to work on the optimizations (likely introduce a new CPP_* token type to refer to a range of libcpp owned memory (start + size) and similarly some tree which can do the same, and can be at any time e.g. split into 2 subparts + say INTEGER_CST in between if needed say for const unsigned char d[] = { #embed "2GB.dat" prefix (0, 0, ) suffix (, [0x4000] = 42) }; still without having to copy around huge amounts of data; STRING_CST owns the memory it points to and can be only 2GB in size), but would like to do that incrementally. And would like to first include some extensions also not included in this patch, like gnu::offset (off) parameter to allow to skip certain constant amount of bytes at the start of the files, plus gnu::base64 ("base64_encoded_data") parameter to add something which can store more efficiently large amounts of the #embed data in preprocessed source. I've been cross-checking all the tests also against the LLVM implementation https://github.com/llvm/llvm-project/pull/68620 which has been for a few hours even committed to LLVM trunk but reverted afterwards. The patch uses --embed-dir= option that clang plans to add above and doesn't use other variants on the search directories yet, plus there are no default directories at least for the time being where to search for embed files. So, #embed "..." works if it is found in the same directory (or relative to the current file's directory) and #embed "/..." or #embed work always, but relative #embed <...> doesn't unless at least one --embed-dir= is specified. There is no reason to differentiate between system and non-system directories, so we don't need -isystem like counterpart, perhaps -iquote like counterpart could be useful in the future, dunno what else. Should we have --embed-directory= as alias to --embed-dir=? There are some differences beyond clang ICEs, so I'd like to point them out to make sure there is agreement on the choices in the patch. They are also mentioned in the comments of the llvm pull request. The most important is that the GCC patch (as well as the original thephd.dev LLVM branch on godbolt) expands #embed (or acts as if it is expanded) into a mere sequence of numbers like 123,2,35,26 rather then what clang effectively treats as (unsigned char)123,(unsigned char)2,(unsigned char)35,(unsigned char)26 but only does that when using integrated preprocessor, not when using -save-temps where it acts as GCC. JeanHeyd as the original author agrees that is how it is currently worded in C23. Another difference (not tested in the testsuite, not sure how to check for effective target /dev/urandom nor am sure it is desirable to check that during testsuite) is how to treat character devices, named pipes etc. (block devices are errored on). The original paper uses /dev/urandom in various examples and seems to assume that unlike regular files the devices aren't really cached, so #embed limit(1) prefix(int a = ) suffix(;) #embed limit(1) prefix(int b = ) suffix(;) usually results in a != b. That is what the godbolt thephd.dev branch implements too and what this patch does as well, but clang actually seems to just go from st.st_size == 0, ergo it must be zero-sized resource and so just copies over if_empty if present. It is really questionable what to do about the character devices/named pipes with __has_embed, for regular files the patch doesn't read anything from them, relies on st.st_size + limit for whether it is empty or non-empty. But I don't know of a way to check if read on say a character device would read anything or not (the limit (1) vs. limit (1) cases), and if we read something, that would be better cached for later because #embed later if it reads again could read no further data even when it first read something. So, the patch currently for __has_embed just a
[PATCH] libstdc++: Fix build for AVR [PR115481, PR111639]
The attached patch fixes 115481 and 111639. It was tested against avr-libc 2.0.0 (which has *f as macros), 2.1.0 (which has *f as functions, but no long double), and 2.2.0 (which has long double functions). Only the libstdc++ build was tested, not the resulting library. Detlef commit 9112ae579f73b249e9515d5ac9ddc3a76b067c39 Author: Detlef Vollmann Date: Fri Jun 14 14:34:31 2024 +0200 libstdc++-v3: detect math functions when using avr-libc and handle macros Different versions of avr-libc have different definitions of math functions: - 2.0.0 and earlier have float versions as macros - 2.1.0 has float versions as proper functions, but no long double versions - 2.2.0 has long double versions This commit tells configure to always check if these functions are available. It also #undef's any macros. libstdc++-v3/ChangeLog: * crossconfig.m4 [avr*-*-*]: Add compile-checks for float and long double functions. * configure: Regenerate. * src/c++98/math_stubs_float.cc: #undef any macros. diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure index 5645e991af7..17dbae7bd87 100755 --- a/libstdc++-v3/configure +++ b/libstdc++-v3/configure @@ -5080,7 +5080,7 @@ else We can't simply define LARGE_OFF_T to be 9223372036854775807, since some C++ compilers masquerading as C compilers incorrectly reject 9223372036854775807. */ -#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62)) +#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31)) int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721 && LARGE_OFF_T % 2147483647 == 1) ? 1 : -1]; @@ -5126,7 +5126,7 @@ else We can't simply define LARGE_OFF_T to be 9223372036854775807, since some C++ compilers masquerading as C compilers incorrectly reject 9223372036854775807. */ -#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62)) +#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31)) int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721 && LARGE_OFF_T % 2147483647 == 1) ? 1 : -1]; @@ -5150,7 +5150,7 @@ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext We can't simply define LARGE_OFF_T to be 9223372036854775807, since some C++ compilers masquerading as C compilers incorrectly reject 9223372036854775807. */ -#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62)) +#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31)) int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721 && LARGE_OFF_T % 2147483647 == 1) ? 1 : -1]; @@ -5195,7 +5195,7 @@ else We can't simply define LARGE_OFF_T to be 9223372036854775807, since some C++ compilers masquerading as C compilers incorrectly reject 9223372036854775807. */ -#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62)) +#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31)) int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721 && LARGE_OFF_T % 2147483647 == 1) ? 1 : -1]; @@ -5219,7 +5219,7 @@ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext We can't simply define LARGE_OFF_T to be 9223372036854775807, since some C++ compilers masquerading as C compilers incorrectly reject 9223372036854775807. */ -#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62)) +#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31)) int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721 && LARGE_OFF_T % 2147483647 == 1) ? 1 : -1]; @@ -28628,51 +28628,2354 @@ case "${host}" in ;; avr*-*-*) -$as_echo "#define HAVE_ACOSF 1" >>confdefs.h -$as_echo "#define HAVE_ASINF 1" >>confdefs.h -$as_echo "#define HAVE_ATAN2F 1" >>confdefs.h -$as_echo "#define HAVE_ATANF 1" >>confdefs.h + { $as_echo "$as_me:${as_lineno-$LINENO}: checking for acosf declaration" >&5 +$as_echo_n "checking for acosf declaration... " >&6; } +if ${glibcxx_cv_func_acosf_use+:} false; then : + $as_echo_n "(cached) " >&6 +else + + + ac_ext=c +ac_cpp='$CPP $CPPFLAGS' +ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' +ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' +ac_compiler_gnu=$ac_cv_c_compiler_gnu + + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ + +#include +#ifdef HAVE_IEEEFP_H +# include +#endif +#undef acosf + +int +main () +{ + + void (*f)(void) = (void (*)(void))acosf; + + ; + return 0; +} +_ACEOF +if ac_fn_c_try_compile "$LINENO"; then : + glibcxx_cv_func_acosf_use=yes + +else + glibcxx_cv_func_acosf_use=no +fi +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $glibcxx_cv_func_acosf_use" >&5 +$as_echo "$glibcxx_cv_func_acosf_use" >&6; } +
Re: [PATCH] libstdc++: Fix build for AVR [PR115481, PR111639]
On Fri, 2024-06-14 at 19:37 +0200, Detlef Vollmann wrote: > diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure > index 5645e991af7..17dbae7bd87 100755 > --- a/libstdc++-v3/configure > +++ b/libstdc++-v3/configure > @@ -5080,7 +5080,7 @@ else > We can't simply define LARGE_OFF_T to be 9223372036854775807, > since some C++ compilers masquerading as C compilers > incorrectly reject 9223372036854775807. */ > -#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62)) > +#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << > 31)) > int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721 > && LARGE_OFF_T % 2147483647 == 1) > ? 1 : -1]; This shouldn't happen. Please regenerate using *vanilla* autoconf-2.69. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] libstdc++: Fix build for AVR [PR115481, PR111639]
On Fri, 14 Jun 2024 at 18:45, Xi Ruoyao wrote: > > On Fri, 2024-06-14 at 19:37 +0200, Detlef Vollmann wrote: > > diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure > > index 5645e991af7..17dbae7bd87 100755 > > --- a/libstdc++-v3/configure > > +++ b/libstdc++-v3/configure > > @@ -5080,7 +5080,7 @@ else > > We can't simply define LARGE_OFF_T to be 9223372036854775807, > > since some C++ compilers masquerading as C compilers > > incorrectly reject 9223372036854775807. */ > > -#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62)) > > +#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << > > 31)) > >int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721 > > && LARGE_OFF_T % 2147483647 == 1) > > ? 1 : -1]; > > This shouldn't happen. Please regenerate using *vanilla* autoconf-2.69. Yes, please, if possible. But I can regenerate it locally before pushing if needed.
[committed] testsuite: Add -Wno-psabi to vshuf-mem.C test
On Mon, Jun 10, 2024 at 10:06:31AM +0200, Andreas Krebbel wrote: > The current implementation assumes to always be invoked with register > operands. For memory operands we even have an instruction > though (vlrep). With the patch we try this first and only if it fails > force the input into a register and continue. > > vec_splats generation fails for single element 128bit types which are > allowed for vec_splat. This is something to sort out with another > patch I guess. > > Bootstrapped and regtested on IBM Z. Committed to mainline. Needs to > be committed to GCC 14 branch as well. The newly added test FAILs on i686-linux. On x86_64-linux make check-g++ RUNTESTFLAGS='--target_board=unix\{-m64,-m32/-msse2,-m32/-mno-sse/-mno-mmx\} dg-torture.exp=vshuf-mem.C' shows that as well. The problem is that without SSE2/MMX the vector is passed differently than normally and so GCC warns about that. -Wno-psabi is the usual way to shut it up. Also wonder about the // { dg-additional-options "-march=z14" { target s390*-*-* } } line, doesn't that mean the test will FAIL on all pre-z14 HW? Shouldn't it use some z14_runtime or similar effective target, or check in main (in that case copied over to g++.target/s390) whether z14 instructions can be actually used at runtime? Tested on x86_64-linux, committed to trunk as obvious. 2024-06-14 Jakub Jelinek * g++.dg/torture/vshuf-mem.C: Add -Wno-psabi to dg-options. --- gcc/testsuite/g++.dg/torture/vshuf-mem.C.jj 2024-06-14 19:45:09.116781920 +0200 +++ gcc/testsuite/g++.dg/torture/vshuf-mem.C2024-06-14 19:56:08.744135867 +0200 @@ -1,4 +1,4 @@ -// { dg-options "-std=c++11" } +// { dg-options "-std=c++11 -Wno-psabi" } // { dg-do run } // { dg-additional-options "-march=z14" { target s390*-*-* } } Jakub
Re: [PATCH v3] Target-independent store forwarding avoidance.
On 6/13/24 5:32 AM, Manolis Tsamis wrote: This pass detects cases of expensive store forwarding and tries to avoid them by reordering the stores and using suitable bit insertion sequences. For example it can transform this: strbw2, [x1, 1] ldr x0, [x1] # Expensive store forwarding to larger load. To: ldr x0, [x1] strbw2, [x1] bfi x0, x2, 0, 8 Assembly like this can appear with bitfields or type punning / unions. On stress-ng when running the cpu-union microbenchmark the following speedups have been observed. Neoverse-N1: +29.4% Intel Coffeelake: +13.1% AMD 5950X:+17.5% gcc/ChangeLog: * Makefile.in: Add avoid-store-forwarding.o. * common.opt: New option -favoid-store-forwarding. * params.opt: New param store-forwarding-max-distance. * doc/invoke.texi: Document new pass. * doc/passes.texi: Document new pass. * passes.def: Schedule a new pass. * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare. * avoid-store-forwarding.cc: New file. gcc/testsuite/ChangeLog: * gcc.target/aarch64/avoid-store-forwarding-1.c: New test. * gcc.target/aarch64/avoid-store-forwarding-2.c: New test. * gcc.target/aarch64/avoid-store-forwarding-3.c: New test. * gcc.target/aarch64/avoid-store-forwarding-4.c: New test. * gcc.target/aarch64/avoid-store-forwarding-5.c: New test. Just a note for others. I've sent Manolis's a few more failures spit out by my tester. The crosses aren't quite all clean, but they're getting closer. Once the crosses are clean we'd run the QEMU emulated targets which take dramatically longer (but which are also a deeper test for this kind of change). Jeff
Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab
On Fri, 2024-06-14 at 12:31 +0200, Richard Biener wrote: > The following retires vcond{,u,eq} optabs by stopping to use them > from the middle-end. Targets instead (should) implement vcond_mask > and vec_cmp{,u,eq} optabs. The PR this change refers to lists > possibly affected targets - those implementing these patterns, > and in particular it lists mips, sparc and ia64 as targets that > most definitely will regress while others might simply remove > their vcond{,u,eq} patterns. > > I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64. > I know riscv doesn't implement any of the legacy optabs. But less > maintained vector targets might need adjustments. No new test failures on LoongArch. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] libstdc++: Fix build for AVR [PR115481, PR111639]
On 6/14/24 19:45, Xi Ruoyao wrote: On Fri, 2024-06-14 at 19:37 +0200, Detlef Vollmann wrote: diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure index 5645e991af7..17dbae7bd87 100755 --- a/libstdc++-v3/configure +++ b/libstdc++-v3/configure @@ -5080,7 +5080,7 @@ else We can't simply define LARGE_OFF_T to be 9223372036854775807, since some C++ compilers masquerading as C compilers incorrectly reject 9223372036854775807. */ -#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62)) +#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31)) int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721 && LARGE_OFF_T % 2147483647 == 1) ? 1 : -1]; This shouldn't happen. Please regenerate using *vanilla* autoconf-2.69. Sorry, I was surprised myself, when I looked at the changes. But after looking at the actual change I thought it doesn't matter. I used Debian's version, I'll try to get a vanilla version working tomorrow (I actually plan anyway to run a test with the built libstdc++ on simulavr). Detlef
[COMMITTED] Do not assume LHS of call is an ssa-name.
gimple_range_fold makes an assumption that if there is a LHS on a call that it is an ssa_name. Especially later in compilation that may not be true. This patch merely avoids calling routines that assume an ssa-name is being passed in. Bootstraps on x86_64-pc-linux-gnu with no regressions. Pushed. Andrew From cef90b0558032b64b021054370ad5b96f8a5c5d8 Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Wed, 12 Jun 2024 09:20:20 -0400 Subject: [PATCH 1/7] Do not assume LHS of call is an ssa-name. gimple_range_fold makes an assumption that the LHS of a call is an ssa_name, which later in compilation may not be true. * gimple-range-fold.cc (fold_using_range::range_of_call): Ensure LHS is an SSA_NAME before invoking gimple_range_global. --- gcc/gimple-range-fold.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc index 6037c29ce11..52fc3f2cb04 100644 --- a/gcc/gimple-range-fold.cc +++ b/gcc/gimple-range-fold.cc @@ -1105,7 +1105,7 @@ fold_using_range::range_of_call (vrange &r, gcall *call, fur_source &) } // If there is an LHS, intersect that with what is known. - if (lhs) + if (gimple_range_ssa_p (lhs)) { Value_Range def (TREE_TYPE (lhs)); gimple_range_global (def, lhs); -- 2.45.0
[COMMITTED] Add merge facility to ssa_lazy_cache.
The ssa_lazy_cache object has a routine to merge a range for an ssa-name with an existing range in the cache. This adds a method which will merge all elements of another ssa_lazy_cache the ssa_lazy_cache is a reasonably efficient object for storing ranges associated a few ssa-names, and it is used in a few places by ranger. Bootstraps on x86_64-pc-linux-gnu with no regressions. There shouldn't be, there are no callers yet :-) Pushed. Andrew From e2679227829ffe2ff4fb974994aacd639219f855 Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Thu, 13 Jun 2024 15:35:55 -0400 Subject: [PATCH 2/7] Add merge facility to ssa_lazy_cache. The ssa_lazy_cache has a routine to merge a range for an ssa-name with an existing range in the cache. This adds a method which will merge all elements of another ssa_lazy_cache. * gimple-range-cache.cc (ssa_lazy_cache::merge): New. * gimple-range-cache.h (ssa_lazy_cache::merge): New prototype. --- gcc/gimple-range-cache.cc | 18 ++ gcc/gimple-range-cache.h | 1 + 2 files changed, 19 insertions(+) diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc index a511a2c3a4c..efaae2ed928 100644 --- a/gcc/gimple-range-cache.cc +++ b/gcc/gimple-range-cache.cc @@ -729,6 +729,24 @@ ssa_lazy_cache::merge_range (tree name, const vrange &r) return true; } +// Merge all elements of CACHE with this cache. +// Any names in CACHE that are not in this one are added. +// Any names in both are merged via merge_range.. + +void +ssa_lazy_cache::merge (const ssa_lazy_cache &cache) +{ + unsigned x; + bitmap_iterator bi; + EXECUTE_IF_SET_IN_BITMAP (cache.active_p, 0, x, bi) +{ + tree name = ssa_name (x); + Value_Range r(TREE_TYPE (name)); + cache.get_range (r, name); + merge_range (ssa_name (x), r); +} +} + // Return TRUE if NAME has a range, and return it in R. bool diff --git a/gcc/gimple-range-cache.h b/gcc/gimple-range-cache.h index c7499f928a9..63410d5437e 100644 --- a/gcc/gimple-range-cache.h +++ b/gcc/gimple-range-cache.h @@ -87,6 +87,7 @@ public: virtual bool get_range (vrange &r, tree name) const; virtual void clear_range (tree name); virtual void clear (); + void merge (const ssa_lazy_cache &); protected: bitmap active_p; }; -- 2.45.0
[COMMITTED] Dont add varying values to gori_on_edge mass, calculations.
The 'gori_on_edge' routine is an API which provides all contextual names that can be calculated on an outgoing edge at once. It provides this via the ssa_lazy_cache object. This patch recognizes that if the value produces VARYING, we should not put it in the list, nor continue processing the dependency chain. Bootstraps on x86_64-pc-linux-gnu with no regressions. Pushed. Andrew From 2f90cb8b36affbb28dba643dd38068fc88c76e12 Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Fri, 14 Jun 2024 11:01:08 -0400 Subject: [PATCH 3/7] Dont add varying values to gori_on_edge mass calculations. gori_on_edge will return an ssa_lazy_cache with all contextual ranges that can be generated by an edge. This patch adjusts it so that a VARYING range is never added. * gimple-range-gori.cc (gori_calc_operands): Do not continue nor add the range when VARYING is produced for an operand. --- gcc/gimple-range-gori.cc | 24 +++- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc index d489aef312c..4f6073c715a 100644 --- a/gcc/gimple-range-gori.cc +++ b/gcc/gimple-range-gori.cc @@ -1605,11 +1605,14 @@ gori_calc_operands (vrange &lhs, gimple *stmt, ssa_cache &r, range_query *q) tmp.set_type (TREE_TYPE (si.ssa1)); if (si.calc_op1 (tmp, lhs, si.op2_range)) si.op1_range.intersect (tmp); - r.set_range (si.ssa1, si.op1_range); - gimple *src = SSA_NAME_DEF_STMT (si.ssa1); - // If defintion is in the same basic lock, evaluate it. - if (src && gimple_bb (src) == gimple_bb (stmt)) - gori_calc_operands (si.op1_range, src, r, q); + if (!si.op1_range.varying_p ()) + { + r.set_range (si.ssa1, si.op1_range); + gimple *src = SSA_NAME_DEF_STMT (si.ssa1); + // If defintion is in the same basic lock, evaluate it. + if (src && gimple_bb (src) == gimple_bb (stmt)) + gori_calc_operands (si.op1_range, src, r, q); + } } if (si.ssa2 && !r.has_range (si.ssa2)) @@ -1617,10 +1620,13 @@ gori_calc_operands (vrange &lhs, gimple *stmt, ssa_cache &r, range_query *q) tmp.set_type (TREE_TYPE (si.ssa2)); if (si.calc_op2 (tmp, lhs, si.op1_range)) si.op2_range.intersect (tmp); - r.set_range (si.ssa2, si.op2_range); - gimple *src = SSA_NAME_DEF_STMT (si.ssa2); - if (src && gimple_bb (src) == gimple_bb (stmt)) - gori_calc_operands (si.op2_range, src, r, q); + if (!si.op2_range.varying_p ()) + { + r.set_range (si.ssa2, si.op2_range); + gimple *src = SSA_NAME_DEF_STMT (si.ssa2); + if (src && gimple_bb (src) == gimple_bb (stmt)) + gori_calc_operands (si.op2_range, src, r, q); + } } } -- 2.45.0
Re: [Patch, Fortran, 96418] Fix Test coarray_alloc_comp_4.f08 ICEs
Hi Andre, the patch looks fairly simple and obvious, so OK from my side. *** Regarding the testsuite: since you renamed one of the testcases gfortran.dg/coarray_alloc_comp_* and moved it to gfortran.dg/coarray/, I checked and noticed that there are other similar runtime tests for coarrays (while some are compile-time only tests). Do we plan to "clean" this up and move more/all related runtime tests to the coarray/ subdirectory? What is the general opinion on this? *** Thanks for the patch! Harald Am 14.06.24 um 09:22 schrieb Andre Vehreschild: Hi all, I messed up renaming of the coarray_alloc_comp-test. This is fixed in the second version of the patch. Sorry for the inconvenience. Additionally I figured that this patch also fixed PR fortran/103112. Regtests ok on x86_64 Fedora 39. Ok for mainline? Regards, Andre On Tue, 11 Jun 2024 16:12:38 +0200 Andre Vehreschild wrote: Hi all, attached patch has already been present in 2020, but lost my attention. It fixes an ICE in the testsuite. The old mails description is: attached patch fixes PR96418 where the code in the testsuite when compiled with -fcoarray=single lead to an ICE. The reason was that the coarray object was derefed as an array, but it was no array. Introducing the test for the descriptor removes the ICE. Regtests ok on x86_64-linux/Fedora 39. Ok for mainline? Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de -- Andre Vehreschild * Email: vehre ad gmx dot de