[PATCH v1] Match: Support form 11 for the unsigned scalar .SAT_SUB
From: Pan Li We missed one match pattern for the unsigned scalar .SAT_SUB, aka form 11. Form 11: #define SAT_SUB_U_11(T) \ T sat_sub_u_11_##T (T x, T y) \ { \ T ret; \ bool overflow = __builtin_sub_overflow (x, y, &ret); \ return overflow ? 0 : ret; \ } Thus, add above form 11 to the match pattern gimple_unsigned_integer_sat_sub. The below test suites are passed for this patch: 1. The rv64gcv fully regression test with newlib. 2. The rv64gcv build with glibc. 3. The x86 bootstrap test. 4. The x86 fully regression test. gcc/ChangeLog: * match.pd: Add form 11 match pattern for .SAT_SUB. Signed-off-by: Pan Li --- gcc/match.pd | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/gcc/match.pd b/gcc/match.pd index 99968d316ed..5c330a43ed0 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3186,13 +3186,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) && types_match (type, @0, @1 -/* Unsigned saturation sub, case 7 (branch with .SUB_OVERFLOW). */ +/* Unsigned saturation sub, case 7 (branch eq with .SUB_OVERFLOW). */ (match (unsigned_integer_sat_sub @0 @1) (cond^ (eq (imagpart (IFN_SUB_OVERFLOW@2 @0 @1)) integer_zerop) (realpart @2) integer_zerop) (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) && types_match (type, @0, @1 +/* Unsigned saturation sub, case 8 (branch ne with .SUB_OVERFLOW). */ +(match (unsigned_integer_sat_sub @0 @1) + (cond^ (ne (imagpart (IFN_SUB_OVERFLOW@2 @0 @1)) integer_zerop) + integer_zerop (realpart @2)) + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) + && types_match (type, @0, @1 + /* x > y && x != XXX_MIN --> x > y x > y && x == XXX_MIN --> false . */ (for eqne (eq ne) -- 2.34.1
Re: [Patch, Fortran, 96418] Fix Test coarray_alloc_comp_4.f08 ICEs
Hi Harald, thank you very much for the review. Committed as: gcc-15-1369-gdb75a6657e9 Regarding your question on the coarray-tests that are not in the coarray-directory: These test in most cases test only one method of implementing coarrays. I.e., they are either testing just -fcoarray=single or -fcoarray=lib -lcaf_single, which are two different approaches. The tests in the coarray-directory test all available methods to implement coarrays. Pushing all coarray-tests into the coarray-directory will fail a lot of them, because the behavior of -fcoarray=single and -fcoarray=lib -lcaf_single is different in some corner cases. That's why the coarray-tests in the main gfortran-dir are separate. I do understand why it may be confusing, but I don't see an easy solution. Does this answer your question? Thanks again for the review. Regards, Andre On Fri, 14 Jun 2024 21:43:47 +0200 Harald Anlauf wrote: > Hi Andre, > > the patch looks fairly simple and obvious, so OK from my side. > > *** > > Regarding the testsuite: since you renamed one of the testcases > gfortran.dg/coarray_alloc_comp_* and moved it to gfortran.dg/coarray/, > I checked and noticed that there are other similar runtime tests for > coarrays (while some are compile-time only tests). > > Do we plan to "clean" this up and move more/all related runtime > tests to the coarray/ subdirectory? What is the general opinion on > this? > > *** > > Thanks for the patch! > > Harald > > > Am 14.06.24 um 09:22 schrieb Andre Vehreschild: > > Hi all, > > > > I messed up renaming of the coarray_alloc_comp-test. This is fixed in the > > second version of the patch. Sorry for the inconvenience. > > > > Additionally I figured that this patch also fixed PR fortran/103112. > > > > Regtests ok on x86_64 Fedora 39. Ok for mainline? > > > > Regards, > > Andre > > > > On Tue, 11 Jun 2024 16:12:38 +0200 > > Andre Vehreschild wrote: > > > >> Hi all, > >> > >> attached patch has already been present in 2020, but lost my attention. It > >> fixes an ICE in the testsuite. The old mails description is: > >> > >> attached patch fixes PR96418 where the code in the testsuite when compiled > >> with -fcoarray=single lead to an ICE. The reason was that the coarray > >> object was derefed as an array, but it was no array. Introducing the test > >> for the descriptor removes the ICE. > >> > >> Regtests ok on x86_64-linux/Fedora 39. Ok for mainline? > >> > >> Regards, > >>Andre > >> -- > >> Andre Vehreschild * Email: vehre ad gmx dot de > > > > > > -- > > Andre Vehreschild * Email: vehre ad gmx dot de > -- Andre Vehreschild * Email: vehre ad gmx dot de
[PATCH Committed][APX ZU] Fix test for target-support check
Fix test for APX ZU. Add attribute for no-inline and target APX, and target-support check. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Committed as an obvious patch. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-zu-1.c: Add attribute for noinline, and target apx. * gcc.target/i386/apx-zu-2.c: Add target-support check. --- gcc/testsuite/gcc.target/i386/apx-zu-1.c | 6 ++ gcc/testsuite/gcc.target/i386/apx-zu-2.c | 3 +++ 2 files changed, 9 insertions(+) diff --git a/gcc/testsuite/gcc.target/i386/apx-zu-1.c b/gcc/testsuite/gcc.target/i386/apx-zu-1.c index 927a87673a7..bc0e7fbb4dd 100644 --- a/gcc/testsuite/gcc.target/i386/apx-zu-1.c +++ b/gcc/testsuite/gcc.target/i386/apx-zu-1.c @@ -9,26 +9,32 @@ /* { dg-final { scan-assembler-times "setzue" 1} } */ /* { dg-final { scan-assembler-times "setzuge" 1} } */ /* { dg-final { scan-assembler "imulzu"} } */ + +__attribute__((noinline, noclone, target("apxf"))) long long foo0 (int a) { return a == 0 ? 0 : 1; } +__attribute__((noinline, noclone, target("apxf"))) long foo1 (int a, int b) { return a > b ? 0 : 1; } +__attribute__((noinline, noclone, target("apxf"))) int foo2 (int a, int b) { return a != b ? 0 : 1; } +__attribute__((noinline, noclone, target("apxf"))) short foo3 (int a, int b) { return a < b ? 0 : 1; } +__attribute__((noinline, noclone, target("apxf"))) unsigned long f1(unsigned short x) { diff --git a/gcc/testsuite/gcc.target/i386/apx-zu-2.c b/gcc/testsuite/gcc.target/i386/apx-zu-2.c index 3ee04495d98..7585492bd7c 100644 --- a/gcc/testsuite/gcc.target/i386/apx-zu-2.c +++ b/gcc/testsuite/gcc.target/i386/apx-zu-2.c @@ -5,6 +5,9 @@ int main(void) { + if (!__builtin_cpu_supports ("apxf")) +return 0; + if (foo0 (0)) __builtin_abort (); if (foo1 (3, 2)) -- 2.31.1
[Patch-2v3, rs6000] Eliminate unnecessary byte swaps for duplicated constant vector store [PR113325]
Hi, This patch creates an insn_and_split pattern which helps the duplicated constant vector replace the source pseudo of store insn in fwprop pass. Thus the store can be implemented by a single stxvd2x and it eliminates the unnecessary byte swap insn on P8 LE. The test case shows the optimization. The patch depends on the first generic patch which uses insn cost in fwprop. Compared to previous version, the main change is to move "can_create_pseudo_p ()" to insn condition. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Eliminate unnecessary byte swaps for duplicated constant vector store gcc/ PR target/113325 * config/rs6000/vsx.md (vsx_stxvd2x4_le_const_): New. gcc/testsuite/ PR target/113325 * gcc.target/powerpc/pr113325.c: New. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index f135fa079bd..d350c92141c 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -3368,6 +3368,31 @@ (define_insn "*vsx_stxvd2x4_le_" "stxvd2x %x1,%y0" [(set_attr "type" "vecstore")]) +(define_insn_and_split "vsx_stxvd2x4_le_const_" + [(set (match_operand:VSX_W 0 "memory_operand" "=Z") + (match_operand:VSX_W 1 "immediate_operand" "W"))] + "!BYTES_BIG_ENDIAN + && VECTOR_MEM_VSX_P (mode) + && !TARGET_P9_VECTOR + && const_vec_duplicate_p (operands[1]) + && can_create_pseudo_p ()" + "#" + "&& 1" + [(set (match_dup 2) + (match_dup 1)) + (set (match_dup 0) + (vec_select:VSX_W + (match_dup 2) + (parallel [(const_int 2) (const_int 3) +(const_int 0) (const_int 1)])))] +{ + /* Here all the constants must be loaded without memory. */ + gcc_assert (easy_altivec_constant (operands[1], mode)); + operands[2] = gen_reg_rtx (mode); +} + [(set_attr "type" "vecstore") + (set_attr "length" "8")]) + (define_insn "*vsx_stxvd2x8_le_V8HI" [(set (match_operand:V8HI 0 "memory_operand" "=Z") (vec_select:V8HI diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c b/gcc/testsuite/gcc.target/powerpc/pr113325.c new file mode 100644 index 000..3ca1fcbc9ba --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mdejagnu-cpu=power8 -mvsx" } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */ + +void* foo (void* s1) +{ + return __builtin_memset (s1, 0, 32); +}
Re: [PATCH-1v4] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]
HAO CHEN GUI writes: > Hi, > This patch replaces rtx_cost with insn_cost in forward propagation. > In the PR, one constant vector should be propagated and replace a > pseudo in a store insn if we know it's a duplicated constant vector. > It reduces the insn cost but not rtx cost. In this case, the cost is > determined by destination operand (memory or pseudo). Unfortunately, > rtx cost can't help. > > The test case is added in the second rs6000 specific patch. > > Compared to previous version, the main changes are: > 1. Invalidate recog_data when the cached INSN is swapped out. > 2. Pass strict_p according to prop.likely_profitable_p () to > change_is_worthwhile. > > Previous version > https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654276.html > > > The patch causes a regression cases on i386 as the pattern cost > regulation has a bug. Please refer the patch and discussion here. > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651363.html > > Bootstrapped and tested on powerpc64-linux BE and LE with no > regressions. Is it OK for the trunk? > > ChangeLog > fwprop: invoke change_is_worthwhile to judge if a replacement is worthwhile > > gcc/ > * fwprop.cc (try_fwprop_subst_pattern): Invoke change_is_worthwhile > to judge if a replacement is worthwhile. > * recog.cc (swap_change): Invalidate recog_data when the cached INSN > is swapped out. > * rtl-ssa/changes.cc (rtl_ssa::changes_are_worthwhile): Check if the > insn cost of new rtl is unknown and fail the replacement. > > patch.diff > diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc > index de543923b92..4a9f68b66b1 100644 > --- a/gcc/fwprop.cc > +++ b/gcc/fwprop.cc > @@ -471,29 +471,18 @@ try_fwprop_subst_pattern (obstack_watermark &attempt, > insn_change &use_change, >redo_changes (0); > } > > - /* ??? In theory, it should be better to use insn costs rather than > - set_src_costs here. That would involve replacing this code with > - change_is_worthwhile. */ >bool ok = recog (attempt, use_change); > - if (ok && !prop.changed_mem_p () && !use_insn->is_asm ()) > -if (rtx use_set = single_set (use_rtl)) > - { > - bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (use_rtl)); > - temporarily_undo_changes (0); > - auto old_cost = set_src_cost (SET_SRC (use_set), > - GET_MODE (SET_DEST (use_set)), speed); > - redo_changes (0); > - auto new_cost = set_src_cost (SET_SRC (use_set), > - GET_MODE (SET_DEST (use_set)), speed); > - if (new_cost > old_cost > - || (new_cost == old_cost && !prop.likely_profitable_p ())) > - { > - if (dump_file) > - fprintf (dump_file, "change not profitable" > -" (cost %d -> cost %d)\n", old_cost, new_cost); > - ok = false; > - } > - } > + if (ok && !prop.changed_mem_p () && !use_insn->is_asm () > + && single_set (use_rtl)) I don't think we should keep the single_set condition after this change. insn_cost can handle all instructions. OK for trunk with that removed. Thanks, Richard > +{ > + bool strict_p = !prop.likely_profitable_p (); > + if (!change_is_worthwhile (use_change, strict_p)) > + { > + if (dump_file) > + fprintf (dump_file, "change not profitable"); > + ok = false; > + } > +} > >if (!ok) > { > diff --git a/gcc/recog.cc b/gcc/recog.cc > index a6799e3f5e6..56370e40e01 100644 > --- a/gcc/recog.cc > +++ b/gcc/recog.cc > @@ -614,7 +614,11 @@ swap_change (int num) >else > std::swap (*changes[num].loc, changes[num].old); >if (changes[num].object && !MEM_P (changes[num].object)) > -std::swap (INSN_CODE (changes[num].object), changes[num].old_code); > +{ > + std::swap (INSN_CODE (changes[num].object), changes[num].old_code); > + if (recog_data.insn == changes[num].object) > + recog_data.insn = nullptr; > +} > } > > /* Temporarily undo all the changes numbered NUM and up, with a view > diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc > index 11639e81bb7..c5ac4956a19 100644 > --- a/gcc/rtl-ssa/changes.cc > +++ b/gcc/rtl-ssa/changes.cc > @@ -186,6 +186,14 @@ rtl_ssa::changes_are_worthwhile (array_slice *const> changes, >if (!change->is_deletion ()) > { > change->new_cost = insn_cost (change->rtl (), for_speed); > + /* If the cost is unknown, replacement is not worthwhile. */ > + if (!change->new_cost) > + { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, > + "Reject replacement due to unknown insn cost.\n"); > + return false; > + } > new_cost += change->new_cost; > if (for_speed) > weighted_new_cost += (cfg_bb->count.to_sreal_scale (entry_count)
Re: [PATCH V4] Add warning options -W[no-]compare-distinct-pointer-types
> On Thu, 17 Aug 2023, Jose E. Marchesi via Gcc-patches wrote: >> [Changes from V3: > : >> LLVM supports an option -W[no-]compare-distinct-pointer-types that can >> be used in order to enable or disable the emission of such warnings. > > It looks this went in, alas is not covered in gcc-14/changes.html? > > Was that intentional? If not, would you mind considering adding something? Will do.
Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab
Richard Biener writes: > On Fri, 14 Jun 2024, Richard Biener wrote: > >> On Fri, 14 Jun 2024, Richard Sandiford wrote: >> >> > Richard Biener writes: >> > > On Fri, 14 Jun 2024, Richard Sandiford wrote: >> > > >> > >> Richard Biener writes: >> > >> > The following retires vcond{,u,eq} optabs by stopping to use them >> > >> > from the middle-end. Targets instead (should) implement vcond_mask >> > >> > and vec_cmp{,u,eq} optabs. The PR this change refers to lists >> > >> > possibly affected targets - those implementing these patterns, >> > >> > and in particular it lists mips, sparc and ia64 as targets that >> > >> > most definitely will regress while others might simply remove >> > >> > their vcond{,u,eq} patterns. >> > >> > >> > >> > I'd appreciate testing, I do not expect fallout for x86 or >> > >> > arm/aarch64. >> > >> > I know riscv doesn't implement any of the legacy optabs. But less >> > >> > maintained vector targets might need adjustments. >> > >> > >> > >> > I want to get rid of those optabs for GCC 15. If I don't hear from >> > >> > you I will assume your target is fine. >> > >> >> > >> Great! Thanks for doing this. >> > >> >> > >> Is there a plan for how we should handle vector comparisons that >> > >> have to be done as the inverse of the negated condition? Should >> > >> targets simply not provide vec_cmp for such conditions and leave >> > >> the target-independent code to deal with the fallout? (For a >> > >> standalone comparison, it would invert the result. For a VEC_COND_EXPR >> > >> it would swap the true and false values.) >> > > >> > > I would expect that the ISEL pass which currently deals with finding >> > > valid combos of .VCMP{,U,EQ} and .VCOND_MASK deals with this. >> > > So how do we deal with this right now? I expect RTL expansion will >> > > do the inverse trick, no? >> > >> > I think in practice (at least for the targets I've worked on), >> > the target's vec_cmp handles the inversion itself. Thus the >> > main optimisation done by targets' vcond patterns is to avoid >> > the inversion (and instead swap the true/false values) when the >> > "opposite" comparison is the native one. >> >> I see. I suppose whether or not vec_cmp is handled is determined >> by a FAIL so it's somewhat difficult to determine this at ISEL time. In principle we could say that the predicates should accept only the conditions that can be done natively. Then target-independent code can apply the usual approaches to generating other conditions (which tend to be replicated across targets anyway). > I'll also note that we document vec_cmp{,u,eq} as having all zeros, > all ones for the result while vcond_mask might only care for the MSB > (it's documented to work on the result of a pre-computed vector > comparison). Not sure how much the docs reflect reality. At least for SVE, vec_cmp returns 0/1 results for vector boolean modes. But I think for integer comparison results, vec_cmp must produce 0/-1 and vcond only accepts 0/-1. > So this eventually asks for targets to work out the optimal sequence > via combine helpers and thus eventually splitters to fixup invalid > compare operators late? I really hope we can do this in late gimple & expand. Thanks, Richard
[PATCH 0/8] Follow-on force_subreg patches
This series expands on the fix for PR115464 by using force_subreg in more places. It also adds some convenience wrappers for lowpart and highpart subregs. A part of this will need to be backported after a grace period, but I'll post the cherry-picked parts separately. Tested on aarch64-linux-gnu & x86_64-linux-gnu. OK to install? Richard Sandiford (8): Make force_subreg emit nothing on failure aarch64: Use force_subreg in more places Make more use of force_subreg Add force_lowpart_subreg aarch64: Add some uses of force_lowpart_subreg Make more use of force_lowpart_subreg Add force_highpart_subreg aarch64: Add some uses of force_highpart_subreg gcc/builtins.cc | 22 +++--- gcc/config/aarch64/aarch64-builtins.cc| 15 +++ gcc/config/aarch64/aarch64-simd.md| 4 +- .../aarch64/aarch64-sve-builtins-base.cc | 10 ++--- .../aarch64/aarch64-sve-builtins-functions.h | 6 +-- .../aarch64/aarch64-sve-builtins-sme.cc | 2 +- gcc/config/aarch64/aarch64.cc | 31 - gcc/explow.cc | 34 +- gcc/explow.h | 2 + gcc/expmed.cc | 26 --- gcc/expr.cc | 44 +-- gcc/optabs.cc | 26 ++- .../aarch64/sve/acle/general/pr115464_2.c | 11 + 13 files changed, 111 insertions(+), 122 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464_2.c -- 2.25.1
[PATCH 1/8] Make force_subreg emit nothing on failure
While adding more uses of force_subreg, I realised that it should be more careful to emit no instructions on failure. This kind of failure should be very rare, so I don't think it's a case worth optimising for. gcc/ * explow.cc (force_subreg): Emit no instructions on failure. --- gcc/explow.cc | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/gcc/explow.cc b/gcc/explow.cc index f6843398c4b..bd93c878064 100644 --- a/gcc/explow.cc +++ b/gcc/explow.cc @@ -756,8 +756,12 @@ force_subreg (machine_mode outermode, rtx op, if (x) return x; + auto *start = get_last_insn (); op = copy_to_mode_reg (innermode, op); - return simplify_gen_subreg (outermode, op, innermode, byte); + rtx res = simplify_gen_subreg (outermode, op, innermode, byte); + if (!res) +delete_insns_since (start); + return res; } /* If X is a memory ref, copy its contents to a new temp reg and return -- 2.25.1
[PATCH 2/8] aarch64: Use force_subreg in more places
This patch makes the aarch64 code use force_subreg instead of simplify_gen_subreg in more places. The criteria were: (1) The code is obviously specific to expand (where new pseudos can be created). (2) The value is obviously an rvalue rather than an lvalue. (3) The offset wasn't a simple lowpart or highpart calculation; a later patch will deal with those. gcc/ * config/aarch64/aarch64-builtins.cc (aarch64_expand_fcmla_builtin): Use force_subreg instead of simplify_gen_subreg. * config/aarch64/aarch64-simd.md (ctz2): Likewise. * config/aarch64/aarch64-sve-builtins-base.cc (svget_impl::expand): Likewise. (svget_neonq_impl::expand): Likewise. * config/aarch64/aarch64-sve-builtins-functions.h (multireg_permute::expand): Likewise. --- gcc/config/aarch64/aarch64-builtins.cc | 4 ++-- gcc/config/aarch64/aarch64-simd.md | 4 ++-- gcc/config/aarch64/aarch64-sve-builtins-base.cc | 8 +++- gcc/config/aarch64/aarch64-sve-builtins-functions.h | 6 +++--- 4 files changed, 10 insertions(+), 12 deletions(-) diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc index d589e59defc..7d827cbc2ac 100644 --- a/gcc/config/aarch64/aarch64-builtins.cc +++ b/gcc/config/aarch64/aarch64-builtins.cc @@ -2592,12 +2592,12 @@ aarch64_expand_fcmla_builtin (tree exp, rtx target, int fcode) rtx temp2 = gen_reg_rtx (DImode); temp1 = simplify_gen_subreg (d->mode, op2, quadmode, subreg_lowpart_offset (d->mode, quadmode)); - temp1 = simplify_gen_subreg (V2DImode, temp1, d->mode, 0); + temp1 = force_subreg (V2DImode, temp1, d->mode, 0); if (BYTES_BIG_ENDIAN) emit_insn (gen_aarch64_get_lanev2di (temp2, temp1, const0_rtx)); else emit_insn (gen_aarch64_get_lanev2di (temp2, temp1, const1_rtx)); - op2 = simplify_gen_subreg (d->mode, temp2, GET_MODE (temp2), 0); + op2 = force_subreg (d->mode, temp2, GET_MODE (temp2), 0); /* And recalculate the index. */ lane -= nunits / 4; diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 0bb39091a38..01b084d8ccb 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -389,8 +389,8 @@ (define_expand "ctz2" "TARGET_SIMD" { emit_insn (gen_bswap2 (operands[0], operands[1])); - rtx op0_castsi2qi = simplify_gen_subreg(mode, operands[0], -mode, 0); + rtx op0_castsi2qi = force_subreg (mode, operands[0], + mode, 0); emit_insn (gen_aarch64_rbit (op0_castsi2qi, op0_castsi2qi)); emit_insn (gen_clz2 (operands[0], operands[0])); DONE; diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc index 823d60040f9..99932037124 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc @@ -1121,9 +1121,8 @@ public: expand (function_expander &e) const override { /* Fold the access into a subreg rvalue. */ -return simplify_gen_subreg (e.vector_mode (0), e.args[0], - GET_MODE (e.args[0]), - INTVAL (e.args[1]) * BYTES_PER_SVE_VECTOR); +return force_subreg (e.vector_mode (0), e.args[0], GET_MODE (e.args[0]), +INTVAL (e.args[1]) * BYTES_PER_SVE_VECTOR); } }; @@ -1157,8 +1156,7 @@ public: e.add_fixed_operand (indices); return e.generate_insn (icode); } -return simplify_gen_subreg (e.result_mode (), e.args[0], - GET_MODE (e.args[0]), 0); +return force_subreg (e.result_mode (), e.args[0], GET_MODE (e.args[0]), 0); } }; diff --git a/gcc/config/aarch64/aarch64-sve-builtins-functions.h b/gcc/config/aarch64/aarch64-sve-builtins-functions.h index 3b8e575e98e..7d06a57ff83 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-functions.h +++ b/gcc/config/aarch64/aarch64-sve-builtins-functions.h @@ -639,9 +639,9 @@ public: { machine_mode elt_mode = e.vector_mode (0); rtx arg = e.args[0]; - e.args[0] = simplify_gen_subreg (elt_mode, arg, GET_MODE (arg), 0); - e.args.safe_push (simplify_gen_subreg (elt_mode, arg, GET_MODE (arg), - GET_MODE_SIZE (elt_mode))); + e.args[0] = force_subreg (elt_mode, arg, GET_MODE (arg), 0); + e.args.safe_push (force_subreg (elt_mode, arg, GET_MODE (arg), + GET_MODE_SIZE (elt_mode))); } return e.use_exact_insn (icode); } -- 2.25.1
[PATCH 3/8] Make more use of force_subreg
This patch makes target-independent code use force_subreg instead of simplify_gen_subreg in some places. The criteria were: (1) The code is obviously specific to expand (where new pseudos can be created), or at least would be invalid to call when !can_create_pseudo_p () and temporaries are needed. (2) The value is obviously an rvalue rather than an lvalue. (3) The offset wasn't a simple lowpart or highpart calculation; a later patch will deal with those. Doing this should reduce the likelihood of bugs like PR115464 occuring in other situations. gcc/ * expmed.cc (store_bit_field_using_insv): Use force_subreg instead of simplify_gen_subreg. (store_bit_field_1): Likewise. (extract_bit_field_as_subreg): Likewise. (extract_integral_bit_field): Likewise. (emit_store_flag_1): Likewise. * expr.cc (convert_move): Likewise. (convert_modes): Likewise. (emit_group_load_1): Likewise. (emit_group_store): Likewise. (expand_assignment): Likewise. --- gcc/expmed.cc | 22 -- gcc/expr.cc | 27 --- 2 files changed, 20 insertions(+), 29 deletions(-) diff --git a/gcc/expmed.cc b/gcc/expmed.cc index 9ba01695f53..1f68e7be721 100644 --- a/gcc/expmed.cc +++ b/gcc/expmed.cc @@ -695,13 +695,7 @@ store_bit_field_using_insv (const extraction_insn *insv, rtx op0, if we must narrow it, be sure we do it correctly. */ if (GET_MODE_SIZE (value_mode) < GET_MODE_SIZE (op_mode)) - { - tmp = simplify_subreg (op_mode, value1, value_mode, 0); - if (! tmp) - tmp = simplify_gen_subreg (op_mode, - force_reg (value_mode, value1), - value_mode, 0); - } + tmp = force_subreg (op_mode, value1, value_mode, 0); else { if (targetm.mode_rep_extended (op_mode, value_mode) != UNKNOWN) @@ -806,7 +800,7 @@ store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum, if (known_eq (bitnum, 0U) && known_eq (bitsize, GET_MODE_BITSIZE (GET_MODE (op0 { - sub = simplify_gen_subreg (GET_MODE (op0), value, fieldmode, 0); + sub = force_subreg (GET_MODE (op0), value, fieldmode, 0); if (sub) { if (reverse) @@ -1633,7 +1627,7 @@ extract_bit_field_as_subreg (machine_mode mode, rtx op0, && known_eq (bitsize, GET_MODE_BITSIZE (mode)) && lowpart_bit_field_p (bitnum, bitsize, op0_mode) && TRULY_NOOP_TRUNCATION_MODES_P (mode, op0_mode)) -return simplify_gen_subreg (mode, op0, op0_mode, bytenum); +return force_subreg (mode, op0, op0_mode, bytenum); return NULL_RTX; } @@ -2000,11 +1994,11 @@ extract_integral_bit_field (rtx op0, opt_scalar_int_mode op0_mode, return convert_extracted_bit_field (target, mode, tmode, unsignedp); } /* If OP0 is a hard register, copy it to a pseudo before calling -simplify_gen_subreg. */ +force_subreg. */ if (REG_P (op0) && HARD_REGISTER_P (op0)) op0 = copy_to_reg (op0); - op0 = simplify_gen_subreg (word_mode, op0, op0_mode.require (), -bitnum / BITS_PER_WORD * UNITS_PER_WORD); + op0 = force_subreg (word_mode, op0, op0_mode.require (), + bitnum / BITS_PER_WORD * UNITS_PER_WORD); op0_mode = word_mode; bitnum %= BITS_PER_WORD; } @@ -5774,8 +5768,8 @@ emit_store_flag_1 (rtx target, enum rtx_code code, rtx op0, rtx op1, /* Do a logical OR or AND of the two words and compare the result. */ - op00 = simplify_gen_subreg (word_mode, op0, int_mode, 0); - op01 = simplify_gen_subreg (word_mode, op0, int_mode, UNITS_PER_WORD); + op00 = force_subreg (word_mode, op0, int_mode, 0); + op01 = force_subreg (word_mode, op0, int_mode, UNITS_PER_WORD); tem = expand_binop (word_mode, op1 == const0_rtx ? ior_optab : and_optab, op00, op01, NULL_RTX, unsignedp, diff --git a/gcc/expr.cc b/gcc/expr.cc index 9cecc1758f5..31a7346e33f 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -301,7 +301,7 @@ convert_move (rtx to, rtx from, int unsignedp) GET_MODE_BITSIZE (to_mode))); if (VECTOR_MODE_P (to_mode)) - from = simplify_gen_subreg (to_mode, from, GET_MODE (from), 0); + from = force_subreg (to_mode, from, GET_MODE (from), 0); else to = simplify_gen_subreg (from_mode, to, GET_MODE (to), 0); @@ -935,7 +935,7 @@ convert_modes (machine_mode mode, machine_mode oldmode, rtx x, int unsignedp) { gcc_assert (known_eq (GET_MODE_BITSIZE (mode), GET_MODE_BITSIZE (oldmode))); - return simplify_gen_subreg (mode, x, oldmode, 0); + re
[PATCH 5/8] aarch64: Add some uses of force_lowpart_subreg
This patch makes more use of force_lowpart_subreg, similarly to the recent patch for force_subreg. The criteria were: (1) The code is obviously specific to expand (where new pseudos can be created). (2) The value is obviously an rvalue rather than an lvalue. gcc/ PR target/115464 * config/aarch64/aarch64-builtins.cc (aarch64_expand_fcmla_builtin) (aarch64_expand_rwsr_builtin): Use force_lowpart_subreg instead of simplify_gen_subreg and lowpart_subreg. * config/aarch64/aarch64-sve-builtins-base.cc (svset_neonq_impl::expand): Likewise. * config/aarch64/aarch64-sve-builtins-sme.cc (add_load_store_slice_operand): Likewise. * config/aarch64/aarch64.cc (aarch64_sve_reinterpret): Likewise. (aarch64_addti_scratch_regs, aarch64_subvti_scratch_regs): Likewise. gcc/testsuite/ PR target/115464 * gcc.target/aarch64/sve/acle/general/pr115464_2.c: New test. --- gcc/config/aarch64/aarch64-builtins.cc | 11 +-- gcc/config/aarch64/aarch64-sve-builtins-base.cc| 2 +- gcc/config/aarch64/aarch64-sve-builtins-sme.cc | 2 +- gcc/config/aarch64/aarch64.cc | 14 +- .../aarch64/sve/acle/general/pr115464_2.c | 11 +++ 5 files changed, 23 insertions(+), 17 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464_2.c diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc index 7d827cbc2ac..30669f8aa18 100644 --- a/gcc/config/aarch64/aarch64-builtins.cc +++ b/gcc/config/aarch64/aarch64-builtins.cc @@ -2579,8 +2579,7 @@ aarch64_expand_fcmla_builtin (tree exp, rtx target, int fcode) int lane = INTVAL (lane_idx); if (lane < nunits / 4) -op2 = simplify_gen_subreg (d->mode, op2, quadmode, - subreg_lowpart_offset (d->mode, quadmode)); +op2 = force_lowpart_subreg (d->mode, op2, quadmode); else { /* Select the upper 64 bits, either a V2SF or V4HF, this however @@ -2590,8 +2589,7 @@ aarch64_expand_fcmla_builtin (tree exp, rtx target, int fcode) gen_highpart_mode generates code that isn't optimal. */ rtx temp1 = gen_reg_rtx (d->mode); rtx temp2 = gen_reg_rtx (DImode); - temp1 = simplify_gen_subreg (d->mode, op2, quadmode, - subreg_lowpart_offset (d->mode, quadmode)); + temp1 = force_lowpart_subreg (d->mode, op2, quadmode); temp1 = force_subreg (V2DImode, temp1, d->mode, 0); if (BYTES_BIG_ENDIAN) emit_insn (gen_aarch64_get_lanev2di (temp2, temp1, const0_rtx)); @@ -2836,7 +2834,7 @@ aarch64_expand_rwsr_builtin (tree exp, rtx target, int fcode) case AARCH64_WSR64: case AARCH64_WSRF64: case AARCH64_WSR128: - subreg = lowpart_subreg (sysreg_mode, input_val, mode); + subreg = force_lowpart_subreg (sysreg_mode, input_val, mode); break; case AARCH64_WSRF: subreg = gen_lowpart_SUBREG (SImode, input_val); @@ -2871,7 +2869,8 @@ aarch64_expand_rwsr_builtin (tree exp, rtx target, int fcode) case AARCH64_RSR64: case AARCH64_RSRF64: case AARCH64_RSR128: - return lowpart_subreg (TYPE_MODE (TREE_TYPE (exp)), target, sysreg_mode); + return force_lowpart_subreg (TYPE_MODE (TREE_TYPE (exp)), + target, sysreg_mode); case AARCH64_RSRF: subreg = gen_lowpart_SUBREG (SImode, target); return gen_lowpart_SUBREG (SFmode, subreg); diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc index 99932037124..aa26370d397 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc @@ -1183,7 +1183,7 @@ public: if (BYTES_BIG_ENDIAN) return e.use_exact_insn (code_for_aarch64_sve_set_neonq (mode)); insn_code icode = code_for_vcond_mask (mode, mode); -e.args[1] = lowpart_subreg (mode, e.args[1], GET_MODE (e.args[1])); +e.args[1] = force_lowpart_subreg (mode, e.args[1], GET_MODE (e.args[1])); e.add_output_operand (icode); e.add_input_operand (icode, e.args[1]); e.add_input_operand (icode, e.args[0]); diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sme.cc b/gcc/config/aarch64/aarch64-sve-builtins-sme.cc index f4c91bcbb95..b66b35ae60b 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-sme.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-sme.cc @@ -112,7 +112,7 @@ add_load_store_slice_operand (function_expander &e, insn_code icode, rtx base = e.args[argno]; if (e.mode_suffix_id == MODE_vnum) { - rtx vnum = lowpart_subreg (SImode, e.args[vnum_argno], DImode); + rtx vnum = force_lowpart_subreg (SImode, e.args[vnum_argno], DImode); base = simplify_gen_binary (PLUS, SImode, base, vnum); } e.add_input_operand (icode, base); diff --git a/gcc/config/aar
[PATCH 4/8] Add force_lowpart_subreg
optabs had a local function called lowpart_subreg_maybe_copy that is very similar to the lowpart version of force_subreg. This patch adds a force_lowpart_subreg wrapper around force_subreg and uses it in optabs.cc. The only difference between the old and new functions is that the old one asserted success while the new one doesn't. It's common not to assert elsewhere when taking subregs; normally a null result is enough. Later patches will make more use of the new function. gcc/ * explow.h (force_lowpart_subreg): Declare. * explow.cc (force_lowpart_subreg): New function. * optabs.cc (lowpart_subreg_maybe_copy): Delete. (expand_absneg_bit): Use force_lowpart_subreg instead of lowpart_subreg_maybe_copy. (expand_copysign_bit): Likewise. --- gcc/explow.cc | 14 ++ gcc/explow.h | 1 + gcc/optabs.cc | 24 ++-- 3 files changed, 17 insertions(+), 22 deletions(-) diff --git a/gcc/explow.cc b/gcc/explow.cc index bd93c878064..2a91cf76ea6 100644 --- a/gcc/explow.cc +++ b/gcc/explow.cc @@ -764,6 +764,20 @@ force_subreg (machine_mode outermode, rtx op, return res; } +/* Try to return an rvalue expression for the OUTERMODE lowpart of OP, + which has mode INNERMODE. Allow OP to be forced into a new register + if necessary. + + Return null on failure. */ + +rtx +force_lowpart_subreg (machine_mode outermode, rtx op, + machine_mode innermode) +{ + auto byte = subreg_lowpart_offset (outermode, innermode); + return force_subreg (outermode, op, innermode, byte); +} + /* If X is a memory ref, copy its contents to a new temp reg and return that reg. Otherwise, return X. */ diff --git a/gcc/explow.h b/gcc/explow.h index cbd1fcb7eb3..dd654649b06 100644 --- a/gcc/explow.h +++ b/gcc/explow.h @@ -43,6 +43,7 @@ extern rtx copy_to_suggested_reg (rtx, rtx, machine_mode); extern rtx force_reg (machine_mode, rtx); extern rtx force_subreg (machine_mode, rtx, machine_mode, poly_uint64); +extern rtx force_lowpart_subreg (machine_mode, rtx, machine_mode); /* Return given rtx, copied into a new temp reg if it was in memory. */ extern rtx force_not_mem (rtx); diff --git a/gcc/optabs.cc b/gcc/optabs.cc index c54d275b8b7..d569742beea 100644 --- a/gcc/optabs.cc +++ b/gcc/optabs.cc @@ -3096,26 +3096,6 @@ expand_ffs (scalar_int_mode mode, rtx op0, rtx target) return 0; } -/* Extract the OMODE lowpart from VAL, which has IMODE. Under certain - conditions, VAL may already be a SUBREG against which we cannot generate - a further SUBREG. In this case, we expect forcing the value into a - register will work around the situation. */ - -static rtx -lowpart_subreg_maybe_copy (machine_mode omode, rtx val, - machine_mode imode) -{ - rtx ret; - ret = lowpart_subreg (omode, val, imode); - if (ret == NULL) -{ - val = force_reg (imode, val); - ret = lowpart_subreg (omode, val, imode); - gcc_assert (ret != NULL); -} - return ret; -} - /* Expand a floating point absolute value or negation operation via a logical operation on the sign bit. */ @@ -3204,7 +3184,7 @@ expand_absneg_bit (enum rtx_code code, scalar_float_mode mode, gen_lowpart (imode, op0), immed_wide_int_const (mask, imode), gen_lowpart (imode, target), 1, OPTAB_LIB_WIDEN); - target = lowpart_subreg_maybe_copy (mode, temp, imode); + target = force_lowpart_subreg (mode, temp, imode); set_dst_reg_note (get_last_insn (), REG_EQUAL, gen_rtx_fmt_e (code, mode, copy_rtx (op0)), @@ -4043,7 +4023,7 @@ expand_copysign_bit (scalar_float_mode mode, rtx op0, rtx op1, rtx target, temp = expand_binop (imode, ior_optab, op0, op1, gen_lowpart (imode, target), 1, OPTAB_LIB_WIDEN); - target = lowpart_subreg_maybe_copy (mode, temp, imode); + target = force_lowpart_subreg (mode, temp, imode); } return target; -- 2.25.1
[PATCH 6/8] Make more use of force_lowpart_subreg
This patch makes target-independent code use force_lowpart_subreg instead of simplify_gen_subreg and lowpart_subreg in some places. The criteria were: (1) The code is obviously specific to expand (where new pseudos can be created), or at least would be invalid to call when !can_create_pseudo_p () and temporaries are needed. (2) The value is obviously an rvalue rather than an lvalue. Doing this should reduce the likelihood of bugs like PR115464 occuring in other situations. gcc/ * builtins.cc (expand_builtin_issignaling): Use force_lowpart_subreg instead of simplify_gen_subreg and lowpart_subreg. * expr.cc (convert_mode_scalar, expand_expr_real_2): Likewise. * optabs.cc (expand_doubleword_mod): Likewise. --- gcc/builtins.cc | 7 ++- gcc/expr.cc | 17 + gcc/optabs.cc | 2 +- 3 files changed, 12 insertions(+), 14 deletions(-) diff --git a/gcc/builtins.cc b/gcc/builtins.cc index 5b5307c67b8..bde517b639e 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -2940,8 +2940,7 @@ expand_builtin_issignaling (tree exp, rtx target) { hi = simplify_gen_subreg (imode, temp, fmode, subreg_highpart_offset (imode, fmode)); - lo = simplify_gen_subreg (imode, temp, fmode, - subreg_lowpart_offset (imode, fmode)); + lo = force_lowpart_subreg (imode, temp, fmode); if (!hi || !lo) { scalar_int_mode imode2; @@ -2951,9 +2950,7 @@ expand_builtin_issignaling (tree exp, rtx target) hi = simplify_gen_subreg (imode, temp2, imode2, subreg_highpart_offset (imode, imode2)); - lo = simplify_gen_subreg (imode, temp2, imode2, - subreg_lowpart_offset (imode, -imode2)); + lo = force_lowpart_subreg (imode, temp2, imode2); } } if (!hi || !lo) diff --git a/gcc/expr.cc b/gcc/expr.cc index 31a7346e33f..ffbac513692 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -423,7 +423,8 @@ convert_mode_scalar (rtx to, rtx from, int unsignedp) 0).exists (&toi_mode)) { start_sequence (); - rtx fromi = lowpart_subreg (fromi_mode, from, from_mode); + rtx fromi = force_lowpart_subreg (fromi_mode, from, + from_mode); rtx tof = NULL_RTX; if (fromi) { @@ -443,7 +444,7 @@ convert_mode_scalar (rtx to, rtx from, int unsignedp) NULL_RTX, 1); if (toi) { - tof = lowpart_subreg (to_mode, toi, toi_mode); + tof = force_lowpart_subreg (to_mode, toi, toi_mode); if (tof) emit_move_insn (to, tof); } @@ -475,7 +476,7 @@ convert_mode_scalar (rtx to, rtx from, int unsignedp) 0).exists (&toi_mode)) { start_sequence (); - rtx fromi = lowpart_subreg (fromi_mode, from, from_mode); + rtx fromi = force_lowpart_subreg (fromi_mode, from, from_mode); rtx tof = NULL_RTX; do { @@ -510,11 +511,11 @@ convert_mode_scalar (rtx to, rtx from, int unsignedp) temp4, shift, NULL_RTX, 1); if (!temp5) break; - rtx temp6 = lowpart_subreg (toi_mode, temp5, fromi_mode); + rtx temp6 = force_lowpart_subreg (toi_mode, temp5, + fromi_mode); if (!temp6) break; - tof = lowpart_subreg (to_mode, force_reg (toi_mode, temp6), - toi_mode); + tof = force_lowpart_subreg (to_mode, temp6, toi_mode); if (tof) emit_move_insn (to, tof); } @@ -9784,9 +9785,9 @@ expand_expr_real_2 (const_sepops ops, rtx target, machine_mode tmode, inner_mode = TYPE_MODE (inner_type); if (modifier == EXPAND_INITIALIZER) - op0 = lowpart_subreg (mode, op0, inner_mode); + op0 = force_lowpart_subreg (mode, op0, inner_mode); else - op0= convert_modes (mode, inner_mode, op0, + op0 = convert_modes (mode, inner_mode, op0, TYPE_UNSIGNED (inner_type)); } diff --git a/gcc/optabs.cc b/gcc/optabs.cc index d569742beea..1
[PATCH 8/8] aarch64: Add some uses of force_highpart_subreg
This patch adds uses of force_highpart_subreg to places that already use force_lowpart_subreg. gcc/ * config/aarch64/aarch64.cc (aarch64_addti_scratch_regs): Use force_highpart_subreg instead of gen_highpart and simplify_gen_subreg. (aarch64_subvti_scratch_regs): Likewise. --- gcc/config/aarch64/aarch64.cc | 17 - 1 file changed, 4 insertions(+), 13 deletions(-) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index c952a7cdefe..026f8627a89 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -26873,19 +26873,12 @@ aarch64_addti_scratch_regs (rtx op1, rtx op2, rtx *low_dest, *low_in1 = force_lowpart_subreg (DImode, op1, TImode); *low_in2 = force_lowpart_subreg (DImode, op2, TImode); *high_dest = gen_reg_rtx (DImode); - *high_in1 = gen_highpart (DImode, op1); - *high_in2 = simplify_gen_subreg (DImode, op2, TImode, - subreg_highpart_offset (DImode, TImode)); + *high_in1 = force_highpart_subreg (DImode, op1, TImode); + *high_in2 = force_highpart_subreg (DImode, op2, TImode); } /* Generate DImode scratch registers for 128-bit (TImode) subtraction. - This function differs from 'arch64_addti_scratch_regs' in that - OP1 can be an immediate constant (zero). We must call - subreg_highpart_offset with DImode and TImode arguments, otherwise - VOIDmode will be used for the const_int which generates an internal - error from subreg_size_highpart_offset which does not expect a size of zero. - OP1 represents the TImode destination operand 1 OP2 represents the TImode destination operand 2 LOW_DEST represents the low half (DImode) of TImode operand 0 @@ -26907,10 +26900,8 @@ aarch64_subvti_scratch_regs (rtx op1, rtx op2, rtx *low_dest, *low_in2 = force_lowpart_subreg (DImode, op2, TImode); *high_dest = gen_reg_rtx (DImode); - *high_in1 = simplify_gen_subreg (DImode, op1, TImode, - subreg_highpart_offset (DImode, TImode)); - *high_in2 = simplify_gen_subreg (DImode, op2, TImode, - subreg_highpart_offset (DImode, TImode)); + *high_in1 = force_highpart_subreg (DImode, op1, TImode); + *high_in2 = force_highpart_subreg (DImode, op2, TImode); } /* Generate RTL for 128-bit (TImode) subtraction with overflow. -- 2.25.1
[PATCH 7/8] Add force_highpart_subreg
This patch adds a force_highpart_subreg to go along with the recently added force_lowpart_subreg. gcc/ * explow.h (force_highpart_subreg): Declare. * explow.cc (force_highpart_subreg): New function. * builtins.cc (expand_builtin_issignaling): Use it. * expmed.cc (emit_store_flag_1): Likewise. --- gcc/builtins.cc | 15 --- gcc/explow.cc | 14 ++ gcc/explow.h| 1 + gcc/expmed.cc | 4 +--- 4 files changed, 20 insertions(+), 14 deletions(-) diff --git a/gcc/builtins.cc b/gcc/builtins.cc index bde517b639e..d467d1697b4 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -2835,9 +2835,7 @@ expand_builtin_issignaling (tree exp, rtx target) it is, working on the DImode high part is usually better. */ if (!MEM_P (temp)) { - if (rtx t = simplify_gen_subreg (imode, temp, fmode, - subreg_highpart_offset (imode, - fmode))) + if (rtx t = force_highpart_subreg (imode, temp, fmode)) hi = t; else { @@ -2845,9 +2843,7 @@ expand_builtin_issignaling (tree exp, rtx target) if (int_mode_for_mode (fmode).exists (&imode2)) { rtx temp2 = gen_lowpart (imode2, temp); - poly_uint64 off = subreg_highpart_offset (imode, imode2); - if (rtx t = simplify_gen_subreg (imode, temp2, - imode2, off)) + if (rtx t = force_highpart_subreg (imode, temp2, imode2)) hi = t; } } @@ -2938,8 +2934,7 @@ expand_builtin_issignaling (tree exp, rtx target) it is, working on DImode parts is usually better. */ if (!MEM_P (temp)) { - hi = simplify_gen_subreg (imode, temp, fmode, - subreg_highpart_offset (imode, fmode)); + hi = force_highpart_subreg (imode, temp, fmode); lo = force_lowpart_subreg (imode, temp, fmode); if (!hi || !lo) { @@ -2947,9 +2942,7 @@ expand_builtin_issignaling (tree exp, rtx target) if (int_mode_for_mode (fmode).exists (&imode2)) { rtx temp2 = gen_lowpart (imode2, temp); - hi = simplify_gen_subreg (imode, temp2, imode2, - subreg_highpart_offset (imode, - imode2)); + hi = force_highpart_subreg (imode, temp2, imode2); lo = force_lowpart_subreg (imode, temp2, imode2); } } diff --git a/gcc/explow.cc b/gcc/explow.cc index 2a91cf76ea6..b4a0df89bc3 100644 --- a/gcc/explow.cc +++ b/gcc/explow.cc @@ -778,6 +778,20 @@ force_lowpart_subreg (machine_mode outermode, rtx op, return force_subreg (outermode, op, innermode, byte); } +/* Try to return an rvalue expression for the OUTERMODE highpart of OP, + which has mode INNERMODE. Allow OP to be forced into a new register + if necessary. + + Return null on failure. */ + +rtx +force_highpart_subreg (machine_mode outermode, rtx op, + machine_mode innermode) +{ + auto byte = subreg_highpart_offset (outermode, innermode); + return force_subreg (outermode, op, innermode, byte); +} + /* If X is a memory ref, copy its contents to a new temp reg and return that reg. Otherwise, return X. */ diff --git a/gcc/explow.h b/gcc/explow.h index dd654649b06..de89e9e2933 100644 --- a/gcc/explow.h +++ b/gcc/explow.h @@ -44,6 +44,7 @@ extern rtx force_reg (machine_mode, rtx); extern rtx force_subreg (machine_mode, rtx, machine_mode, poly_uint64); extern rtx force_lowpart_subreg (machine_mode, rtx, machine_mode); +extern rtx force_highpart_subreg (machine_mode, rtx, machine_mode); /* Return given rtx, copied into a new temp reg if it was in memory. */ extern rtx force_not_mem (rtx); diff --git a/gcc/expmed.cc b/gcc/expmed.cc index 1f68e7be721..3b9475f5aa0 100644 --- a/gcc/expmed.cc +++ b/gcc/expmed.cc @@ -5784,9 +5784,7 @@ emit_store_flag_1 (rtx target, enum rtx_code code, rtx op0, rtx op1, rtx op0h; /* If testing the sign bit, can just test on high word. */ - op0h = simplify_gen_subreg (word_mode, op0, int_mode, - subreg_highpart_offset (word_mode, - int_mode)); + op0h = force_highpart_subreg (word_mode, op0, int_mode); tem = emit_store_flag (NULL_RTX, code, op0h, op1, word_mode, unsignedp, normalizep); } -- 2.25.1
Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab
On 14/06/2024 11:31, Richard Biener wrote: The following retires vcond{,u,eq} optabs by stopping to use them from the middle-end. Targets instead (should) implement vcond_mask and vec_cmp{,u,eq} optabs. The PR this change refers to lists possibly affected targets - those implementing these patterns, and in particular it lists mips, sparc and ia64 as targets that most definitely will regress while others might simply remove their vcond{,u,eq} patterns. I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64. I know riscv doesn't implement any of the legacy optabs. But less maintained vector targets might need adjustments. I want to get rid of those optabs for GCC 15. If I don't hear from you I will assume your target is fine. Seems OK for GCN. The GCN vcond patterns are expanded directly to vec_cmp/vcond_mask, so the set of supported operations should be identical. Andrew
Re: [PATCH] libstdc++: Do not use memset _Hashtable buckets allocation
On Sat, 15 Jun 2024 at 14:04, François Dumont wrote: > > Here is the simplified patch then. The use of std::__to_address seems wrong. The allocator returns a __buckets_ptr, and that function returns a __buckets_ptr, so it should just be returned unchanged, not by converting to a raw pointer with __to_address. > > libstdc++: Do not use memset in _Hashtable buckets allocation > > Using memset is incorrect if the __bucket_ptr type is non-trivial, or > does not use an all-zero bit pattern for its null value. > > Replace the use of memset with std::__uinitialized_default_n to set the > pointers to nullptr. Doing so and corresponding std::_Destroy_n > when deallocating > buckets. > > libstdc++-v3/ChangeLog: > > * include/bits/hashtable_policy.h > (_Hashtable_alloc::_M_allocate_buckets): Do not use memset > to zero > out bucket pointers. > (_Hashtable_alloc::_M_deallocate_buckets): Add destroy of > buckets. > > Tested under Linux x64, ok to commit ? > > François > > On 13/06/2024 20:58, Jonathan Wakely wrote: > > On Thu, 13 Jun 2024 at 19:57, Jonathan Wakely wrote: > >> On Thu, 13 Jun 2024 at 18:40, François Dumont wrote: > >>> Hi > >>> > >>> Following your recent change here: > >>> > >>> https://gcc.gnu.org/pipermail/libstdc++/2024-June/058998.html > >>> > >>> I think we also need to fix the memset at bucket allocation level. > >>> > >>> I did it trying also to be more fancy pointer friendly by running > >>> __uninitialized_default_n_a on the allocator returned pointer rather > >>> than on the __to_address result. I wonder if an __uninitialized_fill_n_a > >>> would have been better ? Doing so I also had to call std::_Destroy on > >>> deallocation. Let me know if it is too early. > >> You don't need the RAII guard. Initializing Alloc::pointer isn't > >> allowed to throw exceptions: > >> > >> "An allocator type X shall meet the Cpp17CopyConstructible > >> requirements (Table 32). The XX::pointer, > >> XX::const_pointer, XX::void_pointer, and XX::const_void_pointer types > >> shall meet the Cpp17Nullable- > >> Pointer requirements (Table 36). No constructor, comparison operator > >> function, copy operation, move > >> operation, or swap operation on these pointer types shall exit via an > >> exception." > >> > >> And you should not pass the allocator to the __uninitialized_xxx call, > >> nor the _Destroy call. We don't want to use the allocator's > >> construct/destroy members for those pointers. They are not container > >> elements. > >> > >> I think either uninitialized_fill_n with nullptr or > >> __uninitialized_default_n is fine. Not the _a forms taking an > >> allocator though. > > And I'd use _Destroy_n(_M_buckets, _M_bucket_count) > > > > > >>> I also wonder if the compiler will be able to optimize it to a memset > >>> call ? I'm interested to work on it if you confirm that it won't. > >> It will do whatever is fastest, which might be memset or might be > >> vectorized code to zero it out (which is probably what libc memset > >> does too). > >> > >>> libstdc++: Do not use memset in _Hashtable buckets allocation > >>> > >>> Using memset is incorrect if the __bucket_ptr type is non-trivial, or > >>> does not use an all-zero bit pattern for its null value. > >>> > >>> Replace the use of memset with std::__uinitialized_default_n_a to set the > >>> pointers to nullptr. Doing so and corresponding std::_Destroy when > >>> deallocating > >>> buckets. > >>> > >>> libstdc++-v3/ChangeLog: > >>> > >>> * include/bits/hashtable_policy.h > >>> (_Hashtable_alloc::_M_allocate_buckets): Do not use memset to zero > >>> out bucket pointers. > >>> (_Hashtable_alloc::_M_deallocate_buckets): Add destroy of buckets. > >>> > >>> > >>> I hope you won't ask for copy rights on the changelog entry :-) > >>> > >>> Tested under Linux x64, ok to commit ? > >>> > >>> François
Re: [PATCH] libstdc++: Do not use memset _Hashtable buckets allocation
On Mon, 17 Jun 2024 at 11:18, Jonathan Wakely wrote: > > On Sat, 15 Jun 2024 at 14:04, François Dumont wrote: > > > > Here is the simplified patch then. > > The use of std::__to_address seems wrong. > > The allocator returns a __buckets_ptr, and that function returns a > __buckets_ptr, so it should just be returned unchanged, not by > converting to a raw pointer with __to_address. It was already wrong, but we should fix that now not keep it wrong. Using __to_address to get a pointer to pass to memset was correct. But the result of the __to_address call was used to initialize another __buckets_ptr variable. Which is what we already had before calling __to_address. It would have made sense like this: auto __ptr = __buckets_alloc_traits::allocate(__alloc, __bkt_count); auto* __p = std::__to_address(__ptr); __builtin_memset(__p, 0, __bkt_count * sizeof(__node_base_ptr)); return __ptr; i.e. __p should be a raw pointer (not a __buckets_ptr), and then it should return __ptr not __p. But that isn't what we had. Anyway, now that we're not using memset, we don't need any raw pointer at all, so don't need std::__to_address at all. > > > > > > libstdc++: Do not use memset in _Hashtable buckets allocation > > > > Using memset is incorrect if the __bucket_ptr type is non-trivial, or > > does not use an all-zero bit pattern for its null value. > > > > Replace the use of memset with std::__uinitialized_default_n to set the > > pointers to nullptr. Doing so and corresponding std::_Destroy_n > > when deallocating > > buckets. > > > > libstdc++-v3/ChangeLog: > > > > * include/bits/hashtable_policy.h > > (_Hashtable_alloc::_M_allocate_buckets): Do not use memset > > to zero > > out bucket pointers. > > (_Hashtable_alloc::_M_deallocate_buckets): Add destroy of > > buckets. > > > > Tested under Linux x64, ok to commit ? > > > > François > > > > On 13/06/2024 20:58, Jonathan Wakely wrote: > > > On Thu, 13 Jun 2024 at 19:57, Jonathan Wakely wrote: > > >> On Thu, 13 Jun 2024 at 18:40, François Dumont > > >> wrote: > > >>> Hi > > >>> > > >>> Following your recent change here: > > >>> > > >>> https://gcc.gnu.org/pipermail/libstdc++/2024-June/058998.html > > >>> > > >>> I think we also need to fix the memset at bucket allocation level. > > >>> > > >>> I did it trying also to be more fancy pointer friendly by running > > >>> __uninitialized_default_n_a on the allocator returned pointer rather > > >>> than on the __to_address result. I wonder if an __uninitialized_fill_n_a > > >>> would have been better ? Doing so I also had to call std::_Destroy on > > >>> deallocation. Let me know if it is too early. > > >> You don't need the RAII guard. Initializing Alloc::pointer isn't > > >> allowed to throw exceptions: > > >> > > >> "An allocator type X shall meet the Cpp17CopyConstructible > > >> requirements (Table 32). The XX::pointer, > > >> XX::const_pointer, XX::void_pointer, and XX::const_void_pointer types > > >> shall meet the Cpp17Nullable- > > >> Pointer requirements (Table 36). No constructor, comparison operator > > >> function, copy operation, move > > >> operation, or swap operation on these pointer types shall exit via an > > >> exception." > > >> > > >> And you should not pass the allocator to the __uninitialized_xxx call, > > >> nor the _Destroy call. We don't want to use the allocator's > > >> construct/destroy members for those pointers. They are not container > > >> elements. > > >> > > >> I think either uninitialized_fill_n with nullptr or > > >> __uninitialized_default_n is fine. Not the _a forms taking an > > >> allocator though. > > > And I'd use _Destroy_n(_M_buckets, _M_bucket_count) > > > > > > > > >>> I also wonder if the compiler will be able to optimize it to a memset > > >>> call ? I'm interested to work on it if you confirm that it won't. > > >> It will do whatever is fastest, which might be memset or might be > > >> vectorized code to zero it out (which is probably what libc memset > > >> does too). > > >> > > >>> libstdc++: Do not use memset in _Hashtable buckets allocation > > >>> > > >>> Using memset is incorrect if the __bucket_ptr type is non-trivial, or > > >>> does not use an all-zero bit pattern for its null value. > > >>> > > >>> Replace the use of memset with std::__uinitialized_default_n_a to set > > >>> the > > >>> pointers to nullptr. Doing so and corresponding std::_Destroy when > > >>> deallocating > > >>> buckets. > > >>> > > >>> libstdc++-v3/ChangeLog: > > >>> > > >>> * include/bits/hashtable_policy.h > > >>> (_Hashtable_alloc::_M_allocate_buckets): Do not use memset to zero > > >>> out bucket pointers. > > >>> (_Hashtable_alloc::_M_deallocate_buckets): Add destroy of buckets. > > >>> > > >>> > > >>> I hope you won't ask for copy rights on the changelog entry :-) > > >>> > > >>> Tested under Linux x64, ok to commit ? > > >>> > > >>> François
Re: [PATCH 06/52] m2: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
on 2024/6/15 13:00, Gaius Mulley wrote: > "Kewen.Lin" writes: > >> Hi Gaius, >> static tree build_m2_short_real_node (void) { - tree c; - - /* Define `REAL'. */ - - c = make_node (REAL_TYPE); - TYPE_PRECISION (c) = FLOAT_TYPE_SIZE; - layout_type (c); - return c; + /* Define `SHORTREAL'. */ + layout_type (float_type_node); >>> >>> It looks that float_type_node, double_type_node, float128_type_node and >>> long_double_type_node have been called with layout_type when they are >>> being initialized in function build_common_tree_nodes, maybe we can just >>> assert their TYPE_SIZE. >> >> I just noticed that latest trunk still has {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE >> in gcc/m2 and realized that my comment above was misleading, sorry about >> that. >> It meant TYPE_SIZE (float_type_node) etc. instead of >> {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE, >> as this patch series would like to get rid of >> {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE. >> >> I adjusted them as below patch, does this look good to you? > > Hi Kewen, > > ah yes indeed, lgtm, Thanks Gaius! Pushed as r15-1362-g96fe23eb8a9eba. BR, Kewen
[COMMITTED] Rename Value_Range to value_range.
Now that all remaining users of value_range have been renamed to int_range<>, we can reclaim value_range as a temporary, thus removing the annoying CamelCase. gcc/ChangeLog: * data-streamer-in.cc (streamer_read_value_range): Rename Value_Range to value_range. * data-streamer.h (streamer_read_value_range): Same. * gimple-pretty-print.cc (dump_ssaname_info): Same. * gimple-range-cache.cc (ssa_block_ranges::dump): Same. (ssa_lazy_cache::merge): Same. (block_range_cache::dump): Same. (ssa_cache::merge_range): Same. (ssa_cache::dump): Same. (ranger_cache::edge_range): Same. (ranger_cache::propagate_cache): Same. (ranger_cache::fill_block_cache): Same. (ranger_cache::resolve_dom): Same. (ranger_cache::range_from_dom): Same. (ranger_cache::register_inferred_value): Same. * gimple-range-fold.cc (op1_range): Same. (op2_range): Same. (fold_relations): Same. (fold_using_range::range_of_range_op): Same. (fold_using_range::range_of_phi): Same. (fold_using_range::range_of_call): Same. (fold_using_range::condexpr_adjust): Same. (fold_using_range::range_of_cond_expr): Same. (fur_source::register_outgoing_edges): Same. * gimple-range-fold.h (gimple_range_type): Same. (gimple_range_ssa_p): Same. * gimple-range-gori.cc (gori_compute::compute_operand_range): Same. (gori_compute::logical_combine): Same. (gori_compute::refine_using_relation): Same. (gori_compute::compute_operand1_range): Same. (gori_compute::compute_operand2_range): Same. (gori_compute::compute_operand1_and_operand2_range): Same. (gori_calc_operands): Same. (gori_name_helper): Same. * gimple-range-infer.cc (gimple_infer_range::check_assume_func): Same. (gimple_infer_range::gimple_infer_range): Same. (infer_range_manager::maybe_adjust_range): Same. (infer_range_manager::add_range): Same. * gimple-range-infer.h: Same. * gimple-range-op.cc (gimple_range_op_handler::gimple_range_op_handler): Same. (gimple_range_op_handler::calc_op1): Same. (gimple_range_op_handler::calc_op2): Same. (gimple_range_op_handler::maybe_builtin_call): Same. * gimple-range-path.cc (path_range_query::internal_range_of_expr): Same. (path_range_query::ssa_range_in_phi): Same. (path_range_query::compute_ranges_in_phis): Same. (path_range_query::compute_ranges_in_block): Same. (path_range_query::add_to_exit_dependencies): Same. * gimple-range-trace.cc (debug_seed_ranger): Same. * gimple-range.cc (gimple_ranger::range_of_expr): Same. (gimple_ranger::range_on_entry): Same. (gimple_ranger::range_on_edge): Same. (gimple_ranger::range_of_stmt): Same. (gimple_ranger::prefill_stmt_dependencies): Same. (gimple_ranger::register_inferred_ranges): Same. (gimple_ranger::register_transitive_inferred_ranges): Same. (gimple_ranger::export_global_ranges): Same. (gimple_ranger::dump_bb): Same. (assume_query::calculate_op): Same. (assume_query::calculate_phi): Same. (assume_query::dump): Same. (dom_ranger::range_of_stmt): Same. * ipa-cp.cc (ipcp_vr_lattice::meet_with_1): Same. (ipa_vr_operation_and_type_effects): Same. (ipa_value_range_from_jfunc): Same. (propagate_bits_across_jump_function): Same. (propagate_vr_across_jump_function): Same. (ipcp_store_vr_results): Same. * ipa-cp.h: Same. * ipa-fnsummary.cc (evaluate_conditions_for_known_args): Same. (evaluate_properties_for_edge): Same. * ipa-prop.cc (struct ipa_vr_ggc_hash_traits): Same. (ipa_vr::get_vrange): Same. (ipa_vr::streamer_read): Same. (ipa_vr::streamer_write): Same. (ipa_vr::dump): Same. (ipa_set_jfunc_vr): Same. (ipa_compute_jump_functions_for_edge): Same. (ipcp_get_parm_bits): Same. (ipcp_update_vr): Same. (ipa_record_return_value_range): Same. (ipa_return_value_range): Same. * ipa-prop.h (ipa_return_value_range): Same. (ipa_record_return_value_range): Same. * range-op.h (range_cast): Same. * tree-ssa-dom.cc (dom_opt_dom_walker::set_global_ranges_from_unreachable_edges): Same. (cprop_operand): Same. * tree-ssa-loop-ch.cc (loop_static_stmt_p): Same. * tree-ssa-loop-niter.cc (record_nonwrapping_iv): Same. * tree-ssa-loop-split.cc (split_at_bb_p): Same. * tree-ssa-phiopt.cc (value_replacement): Same. * tree-ssa-strlen.cc (get_range): Same. * tree-ssa-threadedge.cc (hybrid_jt_simplifier::simplify): Same. (hybrid_jt_simplifier::compute_exit_dependencies): Same. * tree-ssanames.cc (set_rang
[PATCH] c++, contracts: Ensure return statements on checkers.
This is a minor tidy-up, tested on x86_64-darwin, OK For trunk? thanks Iain --- 8< --- At present, for pre-conditions and for post-conditions with a void return, we are not emitting a return statement. This patch adds the relevant return statements. gcc/cp/ChangeLog: * contracts.cc (finish_function_contracts): Add return statements to pre-condition and void post-cndition checking functions. Signed-off-by: Iain Sandoe --- gcc/cp/contracts.cc | 3 +++ 1 file changed, 3 insertions(+) diff --git a/gcc/cp/contracts.cc b/gcc/cp/contracts.cc index 634e3cf4fa9..0822624a910 100644 --- a/gcc/cp/contracts.cc +++ b/gcc/cp/contracts.cc @@ -2052,6 +2052,7 @@ finish_function_contracts (tree fndecl) DECL_PENDING_INLINE_P (pre) = false; start_preparsed_function (pre, DECL_ATTRIBUTES (pre), flags); remap_and_emit_conditions (fndecl, pre, PRECONDITION_STMT); + finish_return_stmt (NULL_TREE); tree finished_pre = finish_function (false); expand_or_defer_fn (finished_pre); } @@ -2065,6 +2066,8 @@ finish_function_contracts (tree fndecl) remap_and_emit_conditions (fndecl, post, POSTCONDITION_STMT); if (!VOID_TYPE_P (TREE_TYPE (TREE_TYPE (post finish_return_stmt (get_postcondition_result_parameter (fndecl)); + else + finish_return_stmt (NULL_TREE); tree finished_post = finish_function (false); expand_or_defer_fn (finished_post); -- 2.39.2 (Apple Git-143)
[PATCH] c++, coroutines, contracts: Handle coroutine and void functions [PR110871, PR110872, PR115434].
This patch came out of a discussion on Mattermost about how to deal with contracts/coroutines integration. Actually, it would also allow some semantic checking to be deferred until the same spot - at which time there are no dependent types, which can simplify the process. NOTE: this is a fix for bugs in the existing '2a' contracts impl. it does not attempt to make any of the changes required by P2900 to either code-gen or constexpr handling. Tested on x86_64-darwin, so far, OK for trunk if testing succeeds on x86_64/powerpc64 linux too? thanks, Iain --- 8< --- The current implementation of contracts emits the checks into function bodies in three places; for pre-conditions at the start of the body, for asserts in-line in the function body and for post-conditions as an addition to return statements. In general (at least with existing "2a" contract semantics) the in-line contract asserts behave as expected. However, the mechanism is not applicable to: * Handling pre conditions in coroutines since, for those, the standard specifies a wrapping of the original function body by functionality implementing initial and final suspends (along with some housekeeping to route exceptions). Thus for such transformed function bodies, the preconditions then get actioned after the initial suspend, which does not behave as intended. * Handling post conditions in functions that do not have return statements (which applies to coroutines and void functions). In the following, we identify a potentially transformed function body (in the case of coroutines, this is usually called the "ramp()" function). The patch here re-implements the code insertion in one of the two following ways (code for exposition only): * For functions with no post-conditions we wrap the potentially transformed function as follows: { handle_pre_condition_checking (); potentially_transformed_function_body (); } This implements the intent that the preconditions are processed after the function parameters are initialised but before any other actions. * For functions with post-conditions: try { if (preconditions_exist) handle_pre_condition_checking (); potentially_transformed_function_body (); } finally { handle_post_condition_checking (); } else [only if the function is not marked noexcept(true) ] { __rethrow (); } In this, post-conditions [that might apply to the return value etc.] are evaluated on every non-exceptional edge out of the function. At present, the model here is that exceptions thrown by the function propagate upwards as if there were no contracts present. If the desired semantic becomes that an exception is counted as equivalent to a contract violation - then we can add a second handler in place of the rethrow. At constexpr time we need to evaluate the contract conditions, but not the exceptional path, which is handled by a flag on the EH_ELSE_EXPR that indicates it is in use for contract handling. This patch specifically does not address changes to code-gen and constexpr handling that are contained in P2900. PR c++/115434 PR c++/110871 PR c++/110872 gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_constant_expression): Handle EH_ELSE_EXPR. * contracts.cc (finish_contract_attribute): Remove excess line. (build_contract_condition_function): Post condition handlers are void now. (emit_postconditions_cleanup): Remove. (emit_postconditions): New. (add_pre_condition_fn_call): New. (add_post_condition_fn_call): New. (apply_preconditions): New. (apply_postconditions): New. (maybe_apply_function_contracts): New. (apply_postcondition_to_return): Remove. * contracts.h (apply_postcondition_to_return): Remove. (maybe_apply_function_contracts): Add. * coroutines.cc (coro_build_actor_or_destroy_function): Do not copy contracts to coroutine helpers. * cp-tree.h (CONTRACT_EH_ELSE_P): New. * decl.cc (finish_function): Handle wrapping a possibly transformed function body in contract checks. * typeck.cc (check_return_expr): Remove handling of post conditions on return expressions. gcc/ChangeLog: * gimplify.cc (struct gimplify_ctx): Add a flag to show we are expending a handler. (gimplify_expr): When we are expanding a handler, and the body transforms might have re-written DECL_RESULT into a gimple var, ensure that hander references to DECL_RESULT are also re-written to refer to the gimple var. gcc/testsuite/ChangeLog: * g++.dg/contracts/pr115434.C: New test. * g++.dg/coroutines/pr110871.C: New test. * g++.dg/coroutines/pr110872.C: New test. Signed-off-by: Iain Sandoe --- gcc/cp/constexpr.cc| 16 ++ gcc/cp/contracts.cc| 249
[PATCH] tree-optimization/115508 - fix ICE with SLP scheduling and extern vector
When there's a permute after an extern vector we can run into a case that didn't consider the scheduled node being a permute which lacks a representative. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. PR tree-optimization/115508 * tree-vect-slp.cc (vect_schedule_slp_node): Guard check on representative. * gcc.target/i386/pr115508.c: New testcase. --- gcc/testsuite/gcc.target/i386/pr115508.c | 15 +++ gcc/tree-vect-slp.cc | 1 + 2 files changed, 16 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr115508.c diff --git a/gcc/testsuite/gcc.target/i386/pr115508.c b/gcc/testsuite/gcc.target/i386/pr115508.c new file mode 100644 index 000..a97b2007f7a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr115508.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=znver1" } */ + +typedef long long v4di __attribute__((vector_size(4 * sizeof (long long; + +v4di vec_var; +extern long long array1[]; +long long g(void) +{ + int total_error_4 = 0; + total_error_4 += array1 [0] + array1 [1] + array1 [2] + array1 [3]; + v4di t = vec_var; + long long iorvar = t [1] | t [0] | t [2] | t [3]; + return iorvar + total_error_4; +} diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 38e7fadb679..6ef04b14dd8 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -9674,6 +9674,7 @@ vect_schedule_slp_node (vec_info *vinfo, si = gsi_after_labels (vinfo->bbs[0]); } else if (is_a (vinfo) + && SLP_TREE_CODE (node) != VEC_PERM_EXPR && gimple_bb (last_stmt) != gimple_bb (stmt_info->stmt) && gimple_could_trap_p (stmt_info->stmt)) { -- 2.35.3
Re: [RFC PATCH] ARM: thumb1: Use LDMIA/STMIA for DI/DF loads/stores
Hi Siarahei, On 16/06/2024 09:51, Siarhei Volkau wrote: > If the address register is dead after load/store operation it looks > beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions, > at least if optimizing for size. > > E.g. > ldr r0, [r3, #0] > ldr r1, [r3, #4] @ r3 is dead after > will be replaced by > ldmia r3!, {r0, r1} > > also for reused reg is legal to: > ldr r2, [r3, #0] > ldr r3, [r3, #4] @ r3 reused > will be replaced by > ldmia r3, {r2, r3} > > However, I know little about other thumb CPUs except Cortex M0/M0+. > 1. Is there any drawbacks if optimizing speed? > 2. Might it be profitable for thumb2? I like the idea behind this patch, but I think I'd try first doing this as a peephole2 rule to rewrite the address in this case. That has the additional advantage that we then estimate the size of the instruction more accurately. I think it would then be easy to extend this to thumb2 as well if it looks like a win (perhaps only for -Os in the thumb2 case). > > Regarding code size with the patch gives for v6-m/nofp: >libgcc: -52 bytes / -0.10% > Newlib's libc: -68 bytes / -0.03% > libm: -96 bytes / -0.10% > libstdc++: -140 bytes / -0.02% > > Also I have questions regarding testing the patch. > It's obscure how to do it properly, for now I compile > for arm-none-eabi target and make check seems failing > on any compilable test due to missing symbols from libnosys. > I guess that arm-gnu-elf is the correct triple but it still > advisable for proper commands to make & run the testsuite. For testing, I'd start with something like gcc/testsuite/gcc.target/arm/thumb-andsi.c as a template and adapt that for your specific case. Matching something like "ldmia\tr[0-7]!," should be enough. R. > > Signed-off-by: Siarhei Volkau > --- > gcc/config/arm/arm-protos.h | 2 +- > gcc/config/arm/arm.cc | 7 ++- > gcc/config/arm/thumb1.md| 10 -- > 3 files changed, 15 insertions(+), 4 deletions(-) > > diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h > index 2cd560c9925..548bfbaccdc 100644 > --- a/gcc/config/arm/arm-protos.h > +++ b/gcc/config/arm/arm-protos.h > @@ -254,7 +254,7 @@ extern int thumb_shiftable_const (unsigned HOST_WIDE_INT); > extern enum arm_cond_code maybe_get_arm_condition_code (rtx); > extern void thumb1_final_prescan_insn (rtx_insn *); > extern void thumb2_final_prescan_insn (rtx_insn *); > -extern const char *thumb_load_double_from_address (rtx *); > +extern const char *thumb_load_double_from_address (rtx *, rtx_insn *); > extern const char *thumb_output_move_mem_multiple (int, rtx *); > extern const char *thumb_call_via_reg (rtx); > extern void thumb_expand_cpymemqi (rtx *); > diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc > index b8c32db0a1d..73c2478ed77 100644 > --- a/gcc/config/arm/arm.cc > +++ b/gcc/config/arm/arm.cc > @@ -28350,7 +28350,7 @@ thumb1_output_interwork (void) > a computed memory address. The computed address may involve a > register which is overwritten by the load. */ > const char * > -thumb_load_double_from_address (rtx *operands) > +thumb_load_double_from_address (rtx *operands, rtx_insn *insn) > { >rtx addr; >rtx base; > @@ -28368,6 +28368,11 @@ thumb_load_double_from_address (rtx *operands) >switch (GET_CODE (addr)) > { > case REG: > + if (find_reg_note (insn, REG_DEAD, addr)) > +return "ldmia\t%m1!, {%0, %H0}"; > + else if (REGNO (addr) == REGNO (operands[0]) + 1) > +return "ldmia\t%m1, {%0, %H0}"; > + >operands[2] = adjust_address (operands[1], SImode, 4); > >if (REGNO (operands[0]) == REGNO (addr)) > diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md > index d7074b43f60..8da6887b560 100644 > --- a/gcc/config/arm/thumb1.md > +++ b/gcc/config/arm/thumb1.md > @@ -637,8 +637,11 @@ > case 5: >return \"stmia\\t%0, {%1, %H1}\"; > case 6: > - return thumb_load_double_from_address (operands); > + return thumb_load_double_from_address (operands, insn); > case 7: > + if (MEM_P (operands[0]) && REG_P (XEXP (operands[0], 0)) > + && find_reg_note (insn, REG_DEAD, XEXP (operands[0], 0))) > +return \"stmia\\t%m0!, {%1, %H1}\"; >operands[2] = gen_rtx_MEM (SImode, >plus_constant (Pmode, XEXP (operands[0], 0), 4)); >output_asm_insn (\"str\\t%1, %0\;str\\t%H1, %2\", operands); > @@ -970,8 +973,11 @@ > case 2: >return \"stmia\\t%0, {%1, %H1}\"; > case 3: > - return thumb_load_double_from_address (operands); > + return thumb_load_double_from_address (operands, insn); > case 4: > + if (MEM_P (operands[0]) && REG_P (XEXP (operands[0], 0)) > + && find_reg_note (insn, REG_DEAD, XEXP (operands[0], 0))) > +return \"stmia\\t%m0!, {%1, %H1}\"; >operands[2] = gen_rtx_MEM (SImode, >plus_const
RE: [PATCH v3] aarch64: Add vector popcount besides QImode [PR113859]
Hi, > -Original Message- > From: Pengxuan Zheng > Sent: Friday, June 14, 2024 12:57 AM > To: gcc-patches@gcc.gnu.org > Cc: Pengxuan Zheng > Subject: [PATCH v3] aarch64: Add vector popcount besides QImode [PR113859] > > This patch improves GCC’s vectorization of __builtin_popcount for aarch64 > target > by adding popcount patterns for vector modes besides QImode, i.e., HImode, > SImode and DImode. > > With this patch, we now generate the following for V8HI: > cnt v1.16b, v.16b > uaddlp v2.8h, v1.16b > > For V4HI, we generate: > cnt v1.8b, v.8b > uaddlp v2.4h, v1.8b > > For V4SI, we generate: > cnt v1.16b, v.16b > uaddlp v2.8h, v1.16b > uaddlp v3.4s, v2.8h > > For V2SI, we generate: > cnt v1.8b, v.8b > uaddlp v2.4h, v1.8b > uaddlp v3.2s, v2.4h > > For V2DI, we generate: > cnt v1.16b, v.16b > uaddlp v2.8h, v1.16b > uaddlp v3.4s, v2.8h > uaddlp v4.2d, v3.4s Nice patch! We can do better for these sequences though. Would you instead consider using udot with a 0 accumulator and 1 multiplicatent. Essentially movi v0.16b, #0 movi v1.16b, #1 cnt v3.16b, v2.16b udot v0.4s, v3.16b, v1.16b this has 1 instruction less on the critical path so should be half the latency of the uaddlp variants. For the DI case you'll still need a final uaddlp. Cheers, Tamar > > PR target/113859 > > gcc/ChangeLog: > > * config/aarch64/aarch64-simd.md (aarch64_addlp): > Rename to... > (@aarch64_addlp): ... This. > (popcount2): New define_expand. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/popcnt-vec.c: New test. > > Signed-off-by: Pengxuan Zheng > --- > gcc/config/aarch64/aarch64-simd.md| 28 +++- > gcc/testsuite/gcc.target/aarch64/popcnt-vec.c | 69 +++ > 2 files changed, 96 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt-vec.c > > diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64- > simd.md > index 0bb39091a38..ee73e13534b 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -3461,7 +3461,7 @@ (define_insn > "*aarch64_addlv_ze" >[(set_attr "type" "neon_reduc_add")] > ) > > -(define_expand "aarch64_addlp" > +(define_expand "@aarch64_addlp" >[(set (match_operand: 0 "register_operand") > (plus: > (vec_select: > @@ -3517,6 +3517,32 @@ (define_insn "popcount2" >[(set_attr "type" "neon_cnt")] > ) > > +(define_expand "popcount2" > + [(set (match_operand:VDQHSD 0 "register_operand") > +(popcount:VDQHSD (match_operand:VDQHSD 1 "register_operand")))] > + "TARGET_SIMD" > + { > +/* Generate a byte popcount. */ > +machine_mode mode = == 64 ? V8QImode : V16QImode; > +rtx tmp = gen_reg_rtx (mode); > +auto icode = optab_handler (popcount_optab, mode); > +emit_insn (GEN_FCN (icode) (tmp, gen_lowpart (mode, operands[1]))); > + > +/* Use a sequence of UADDLPs to accumulate the counts. Each step doubles > + the element size and halves the number of elements. */ > +do > + { > +auto icode = code_for_aarch64_addlp (ZERO_EXTEND, GET_MODE (tmp)); > +mode = insn_data[icode].operand[0].mode; > +rtx dest = mode == mode ? operands[0] : gen_reg_rtx (mode); > +emit_insn (GEN_FCN (icode) (dest, tmp)); > +tmp = dest; > + } > +while (mode != mode); > +DONE; > + } > +) > + > ;; 'across lanes' max and min ops. > > ;; Template for outputting a scalar, so we can create __builtins which can be > diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c > b/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c > new file mode 100644 > index 000..0c4926d7ca8 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c > @@ -0,0 +1,69 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fno-vect-cost-model" } */ > + > +/* This function should produce cnt v.16b. */ > +void > +bar (unsigned char *__restrict b, unsigned char *__restrict d) > +{ > + for (int i = 0; i < 1024; i++) > +d[i] = __builtin_popcount (b[i]); > +} > + > +/* This function should produce cnt v.16b and uaddlp (Add Long Pairwise). */ > +void > +bar1 (unsigned short *__restrict b, unsigned short *__restrict d) > +{ > + for (int i = 0; i < 1024; i++) > +d[i] = __builtin_popcount (b[i]); > +} > + > +/* This function should produce cnt v.16b and 2 uaddlp (Add Long Pairwise). > */ > +void > +bar2 (unsigned int *__restrict b, unsigned int *__restrict d) > +{ > + for (int i = 0; i < 1024; i++) > +d[i] = __builtin_popcount (b[i]); > +} > + > +/* This function should produce cnt v.16b and 3 uaddlp (Add Long Pairwise). > */ > +void > +bar3 (unsigned long long *__restrict b, unsigned long long *__restrict d) > +{ > + for (int i = 0; i < 1024; i++) > +d[i] = __builtin_popcountll (b[i]); > +} > + > +/* SLP > + This function should produce cnt v.8b and uaddlp (Add Long Pairwise)
Ping^2 [PATCHv5] Optab: add isnormal_optab for __builtin_isnormal
Hi, Gently ping it. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653001.html Thanks Gui Haochen 在 2024/6/3 10:37, HAO CHEN GUI 写道: > Hi, > All issues were addressed. Gently ping it. > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653001.html > > Thanks > Gui Haochen > > > 在 2024/5/29 14:36, HAO CHEN GUI 写道: >> Hi, >> This patch adds an optab for __builtin_isnormal. The normal check can be >> implemented on rs6000 by a single instruction. It needs an optab to be >> expanded to the certain sequence of instructions. >> >> The subsequent patches will implement the expand on rs6000. >> >> Compared to previous version, the main change is to specify return >> value of the optab should be either 0 or 1. >> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652865.html >> >> Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no >> regressions. Is this OK for trunk? >> >> Thanks >> Gui Haochen >> >> ChangeLog >> optab: Add isnormal_optab for isnormal builtin >> >> gcc/ >> * builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab >> for isnormal builtin. >> * optabs.def (isnormal_optab): New. >> * doc/md.texi (isnormal): Document. >> >> >> patch.diff >> diff --git a/gcc/builtins.cc b/gcc/builtins.cc >> index 53e9d210541..89ba56abf17 100644 >> --- a/gcc/builtins.cc >> +++ b/gcc/builtins.cc >> @@ -2463,6 +2463,8 @@ interclass_mathfn_icode (tree arg, tree fndecl) >>builtin_optab = isfinite_optab; >>break; >> case BUILT_IN_ISNORMAL: >> + builtin_optab = isnormal_optab; >> + break; >> CASE_FLT_FN (BUILT_IN_FINITE): >> case BUILT_IN_FINITED32: >> case BUILT_IN_FINITED64: >> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi >> index 3eb4216141e..4fd7da095fe 100644 >> --- a/gcc/doc/md.texi >> +++ b/gcc/doc/md.texi >> @@ -8563,6 +8563,12 @@ Return 1 if operand 1 is a finite floating point >> number and 0 >> otherwise. @var{m} is a scalar floating point mode. Operand 0 >> has mode @code{SImode}, and operand 1 has mode @var{m}. >> >> +@cindex @code{isnormal@var{m}2} instruction pattern >> +@item @samp{isnormal@var{m}2} >> +Return 1 if operand 1 is a normal floating point number and 0 >> +otherwise. @var{m} is a scalar floating point mode. Operand 0 >> +has mode @code{SImode}, and operand 1 has mode @var{m}. >> + >> @end table >> >> @end ifset >> diff --git a/gcc/optabs.def b/gcc/optabs.def >> index dcd77315c2a..3c401fc0b4c 100644 >> --- a/gcc/optabs.def >> +++ b/gcc/optabs.def >> @@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3") >> OPTAB_D (ilogb_optab, "ilogb$a2") >> OPTAB_D (isinf_optab, "isinf$a2") >> OPTAB_D (isfinite_optab, "isfinite$a2") >> +OPTAB_D (isnormal_optab, "isnormal$a2") >> OPTAB_D (issignaling_optab, "issignaling$a2") >> OPTAB_D (ldexp_optab, "ldexp$a3") >> OPTAB_D (log10_optab, "log10$a2")
[pushed] doc: Mark up __cxa_atexit as @code.
Pushed. (The diff is a bit larger due to line breaks.) Gerald gcc: * doc/install.texi (Configuration): Mark up __cxa_atexit as @code. --- gcc/doc/install.texi | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index 298031dc2de..1774a010889 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -1779,12 +1779,12 @@ Produce code conforming to version 20191213. In the absence of this configuration option the default version is 20191213. @item --enable-__cxa_atexit -Define if you want to use __cxa_atexit, rather than atexit, to +Define if you want to use @code{__cxa_atexit}, rather than atexit, to register C++ destructors for local statics and global objects. This is essential for fully standards-compliant handling of -destructors, but requires __cxa_atexit in libc. This option is currently -only available on systems with GNU libc. When enabled, this will cause -@option{-fuse-cxa-atexit} to be passed by default. +destructors, but requires @code{__cxa_atexit} in libc. This option is +currently only available on systems with GNU libc. When enabled, this +will cause @option{-fuse-cxa-atexit} to be passed by default. @item --enable-gnu-indirect-function Define if you want to enable the @code{ifunc} attribute. This option is -- 2.45.2
[to-be-committed][RISC-V] Handle zero_extract destination for single bit insertions
Combine will use zero_extract destinations for certain bitfield insertions. If the bitfield is a single bit constant, then we can use bset/bclr. In this case we are only dealing with word_mode objects, so we don't have to worry about the SI->DI extension issues for TARGET_64BIT. The testcase was derived from 502.gcc in spec from the RAU team. An earlier version of this (TARGET_64BIT only) went through Ventana's CI system. This version has gone though mine after generalizing it to handle rv32 as well. I'll wait for pre-commit CI to render its verdict before moving forward. Jeff diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md index 311f0d373c0..c6bd55c53f9 100644 --- a/gcc/config/riscv/bitmanip.md +++ b/gcc/config/riscv/bitmanip.md @@ -654,6 +654,18 @@ (define_split (any_or:DI (ashift:DI (const_int 1) (match_dup 1)) (match_dup 3)))]) +;; Yet another form of a bset/bclr that can be created by combine. +(define_insn "*bsetclr_zero_extract" + [(set (zero_extract:X (match_operand:X 0 "register_operand" "+r") + (const_int 1) + (zero_extend:X (match_operand:QI 1 "register_operand" "r"))) + (match_operand 2 "immediate_operand" "n"))] + "TARGET_ZBS + && (operands[2] == CONST0_RTX (mode) + || operands[2] == CONST1_RTX (mode))" + { return operands[2] == CONST0_RTX (mode) ? "bclr\t%0,%0,%1" : "bset\t%0,%0,%1"; } + [(set_attr "type" "bitmanip")]) + (define_insn "*bclr" [(set (match_operand:X 0 "register_operand" "=r") (and:X (rotate:X (const_int -2) diff --git a/gcc/testsuite/gcc.target/riscv/zbs-zext-3.c b/gcc/testsuite/gcc.target/riscv/zbs-zext-3.c new file mode 100644 index 000..0239014e06b --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/zbs-zext-3.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gc_zba_zbb_zbs -mabi=lp64d" { target { rv64 } } } */ +/* { dg-options "-march=rv32gc_zba_zbb_zbs -mabi=ilp32" { target { rv32 } } } */ +/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */ + +/* We need to adjust the constant so this works for rv32 and rv64. */ +#if __riscv_xlen == 32 +#define ONE 1U +#else +#define ONE 1ULL +#endif + +void add_to_hard_reg_set(long long *a, unsigned int count) { + int i = 0; + while(i++ < count) +*a |= (1U << i); +} + +void remove_from_hard_reg_set(long long *a, unsigned int count) { + int i = 0; + while(i++ < count) +*a &= ~(ONE << i); +} + + +/* { dg-final { scan-assembler-not "and\t" } } */ +/* { dg-final { scan-assembler-not "andn\t" } } */
Patch ping
Hi! I'd like to ping the https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653573.html patch. While the committed and backported patch fixed PCH on PIE cc1/cc1plus etc. on PowerPC, it grew up the size of the rs6000_init_generated_builtins function quite a lot. The above patch decreases it back, to even less than the size of the function before my fix. Jakub
[PATCH] tree-optimization/115493 - fix wrong code with SLP induction cond reduction
The following fixes a bad final value being used when doing single-lane SLP integer induction cond reduction vectorization. Bootstrap and regtest running on x86_64-unknown-linux-gnu. PR tree-optimization/115493 * tree-vect-loop.cc (vect_create_epilog_for_reduction): Use the first scalar result. --- gcc/tree-vect-loop.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index d9a2ad69484..7c79e9da106 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -6843,8 +6843,8 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, with the original initial value, unless induc_val is the same as initial_def already. */ tree zcompare = make_ssa_name (boolean_type_node); - epilog_stmt = gimple_build_assign (zcompare, EQ_EXPR, new_temp, -induc_val); + epilog_stmt = gimple_build_assign (zcompare, EQ_EXPR, +scalar_results[0], induc_val); gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); tree initial_def = reduc_info->reduc_initial_values[0]; tree tmp = make_ssa_name (new_scalar_dest); -- 2.35.3
[PATCH][v2] Enhance if-conversion for automatic arrays
Automatic arrays that are not address-taken should not be subject to store data races. This applies to OMP SIMD in-branch lowered functions result array which for the testcase otherwise prevents vectorization with SSE and for AVX and AVX512 ends up with spurious .MASK_STORE to the stack surviving. This inefficiency was noted in PR111793. I've introduced ref_can_have_store_data_races, commonizing uses of flag_store_data_races in if-conversion, cselim and store motion. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. PR tree-optimization/111793 * tree-ssa-alias.h (ref_can_have_store_data_races): Declare. * tree-ssa-alias.cc (ref_can_have_store_data_races): New function. * tree-if-conv.cc (ifcvt_memrefs_wont_trap): Use ref_can_have_store_data_races to allow more unconditional stores. * tree-ssa-loop-im.cc (execute_sm): Likewise. * tree-ssa-phiopt.cc (cond_store_replacement): Likewise. * gcc.dg/vect/vect-simd-clone-21.c: New testcase. --- .../gcc.dg/vect/vect-simd-clone-21.c | 16 gcc/tree-if-conv.cc | 11 +-- gcc/tree-ssa-alias.cc | 19 +++ gcc/tree-ssa-alias.h | 2 ++ gcc/tree-ssa-loop-im.cc | 2 +- gcc/tree-ssa-phiopt.cc| 4 +--- 6 files changed, 44 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c new file mode 100644 index 000..49c52fb59bd --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd" } */ + +#pragma omp declare simd simdlen(4) inbranch +__attribute__((noinline)) int +foo (int a, int b) +{ + return a + b; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" { target i?86-*-* x86_64-*-* } } } */ +/* if-conversion shouldn't need to resort to masked stores for the result + array created by OMP lowering since that's automatic and does not have + its address taken. */ +/* { dg-final { scan-tree-dump-not "MASK_STORE" "vect" } } */ diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index c4c3ed41a44..57992b6deca 100644 --- a/gcc/tree-if-conv.cc +++ b/gcc/tree-if-conv.cc @@ -936,12 +936,11 @@ ifcvt_memrefs_wont_trap (gimple *stmt, vec drs) /* an unconditionaly write won't trap if the base is written to unconditionally. */ - if (base_master_dr - && DR_BASE_W_UNCONDITIONALLY (*base_master_dr)) - return flag_store_data_races; - /* or the base is known to be not readonly. */ - else if (base_object_writable (DR_REF (a))) - return flag_store_data_races; + if ((base_master_dr + && DR_BASE_W_UNCONDITIONALLY (*base_master_dr)) + /* or the base is known to be not readonly. */ + || base_object_writable (DR_REF (a))) + return !ref_can_have_store_data_races (base); } return false; diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc index 1a91d63a31e..fab048b0b59 100644 --- a/gcc/tree-ssa-alias.cc +++ b/gcc/tree-ssa-alias.cc @@ -3704,6 +3704,25 @@ stmt_kills_ref_p (gimple *stmt, tree ref) return stmt_kills_ref_p (stmt, &r); } +/* Return whether REF can be subject to store data races. */ + +bool +ref_can_have_store_data_races (tree ref) +{ + /* With -fallow-store-data-races do not care about them. */ + if (flag_store_data_races) +return false; + + tree base = get_base_address (ref); + if (auto_var_p (base) + && ! may_be_aliased (base)) +/* Automatic variables not aliased are not subject to + data races. */ +return false; + + return true; +} + /* Walk the virtual use-def chain of VUSE until hitting the virtual operand TARGET or a statement clobbering the memory reference REF in which diff --git a/gcc/tree-ssa-alias.h b/gcc/tree-ssa-alias.h index 5cd64e72295..5834533ae9c 100644 --- a/gcc/tree-ssa-alias.h +++ b/gcc/tree-ssa-alias.h @@ -144,6 +144,8 @@ extern bool call_may_clobber_ref_p (gcall *, tree, bool = true); extern bool call_may_clobber_ref_p_1 (gcall *, ao_ref *, bool = true); extern bool stmt_kills_ref_p (gimple *, tree); extern bool stmt_kills_ref_p (gimple *, ao_ref *); +extern bool ref_can_have_store_data_races (tree); + enum translate_flags { TR_TRANSLATE, TR_VALUEIZE_AND_DISAMBIGUATE, TR_DISAMBIGUATE }; extern tree get_continuation_for_phi (gimple *, ao_ref *, bool, diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc index f3fda2bd7ce..3acbd886a0d 100644 --- a/gcc/tree-ssa-loop-im.cc +++ b/gcc/tree-ssa-loop-im.cc @@ -2298,7 +2298,7 @@ execute_sm (class loop *loop, im_mem_ref *ref, bool always_stored = ref_always_access
Re: [PATCH v3] aarch64: Add vector popcount besides QImode [PR113859]
On Mon, Jun 17, 2024, 5:59 AM Tamar Christina wrote: > Hi, > > > -Original Message- > > From: Pengxuan Zheng > > Sent: Friday, June 14, 2024 12:57 AM > > To: gcc-patches@gcc.gnu.org > > Cc: Pengxuan Zheng > > Subject: [PATCH v3] aarch64: Add vector popcount besides QImode > [PR113859] > > > > This patch improves GCC’s vectorization of __builtin_popcount for > aarch64 target > > by adding popcount patterns for vector modes besides QImode, i.e., > HImode, > > SImode and DImode. > > > > With this patch, we now generate the following for V8HI: > > cnt v1.16b, v.16b > > uaddlp v2.8h, v1.16b > > > > For V4HI, we generate: > > cnt v1.8b, v.8b > > uaddlp v2.4h, v1.8b > > > > For V4SI, we generate: > > cnt v1.16b, v.16b > > uaddlp v2.8h, v1.16b > > uaddlp v3.4s, v2.8h > > > > For V2SI, we generate: > > cnt v1.8b, v.8b > > uaddlp v2.4h, v1.8b > > uaddlp v3.2s, v2.4h > > > > For V2DI, we generate: > > cnt v1.16b, v.16b > > uaddlp v2.8h, v1.16b > > uaddlp v3.4s, v2.8h > > uaddlp v4.2d, v3.4s > > Nice patch! We can do better for these sequences though. Would you > instead consider using udot with a 0 accumulator and 1 multiplicatent. > > Essentially > movi v0.16b, #0 > movi v1.16b, #1 > cnt v3.16b, v2.16b > udot v0.4s, v3.16b, v1.16b > > this has 1 instruction less on the critical path so should be half the > latency of the uaddlp variants. > Of course that can only be done if the udot is enabled. But yes I agree that is better. > For the DI case you'll still need a final uaddlp. > > Cheers, > Tamar > > > > > PR target/113859 > > > > gcc/ChangeLog: > > > > * config/aarch64/aarch64-simd.md (aarch64_addlp): > > Rename to... > > (@aarch64_addlp): ... This. > > (popcount2): New define_expand. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/aarch64/popcnt-vec.c: New test. > > > > Signed-off-by: Pengxuan Zheng > > --- > > gcc/config/aarch64/aarch64-simd.md| 28 +++- > > gcc/testsuite/gcc.target/aarch64/popcnt-vec.c | 69 +++ > > 2 files changed, 96 insertions(+), 1 deletion(-) > > create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt-vec.c > > > > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64- > > simd.md > > index 0bb39091a38..ee73e13534b 100644 > > --- a/gcc/config/aarch64/aarch64-simd.md > > +++ b/gcc/config/aarch64/aarch64-simd.md > > @@ -3461,7 +3461,7 @@ (define_insn > > "*aarch64_addlv_ze" > >[(set_attr "type" "neon_reduc_add")] > > ) > > > > -(define_expand "aarch64_addlp" > > +(define_expand "@aarch64_addlp" > >[(set (match_operand: 0 "register_operand") > > (plus: > > (vec_select: > > @@ -3517,6 +3517,32 @@ (define_insn "popcount2" > >[(set_attr "type" "neon_cnt")] > > ) > > > > +(define_expand "popcount2" > > + [(set (match_operand:VDQHSD 0 "register_operand") > > +(popcount:VDQHSD (match_operand:VDQHSD 1 "register_operand")))] > > + "TARGET_SIMD" > > + { > > +/* Generate a byte popcount. */ > > +machine_mode mode = == 64 ? V8QImode : V16QImode; > > +rtx tmp = gen_reg_rtx (mode); > > +auto icode = optab_handler (popcount_optab, mode); > > +emit_insn (GEN_FCN (icode) (tmp, gen_lowpart (mode, operands[1]))); > > + > > +/* Use a sequence of UADDLPs to accumulate the counts. Each step > doubles > > + the element size and halves the number of elements. */ > > +do > > + { > > +auto icode = code_for_aarch64_addlp (ZERO_EXTEND, GET_MODE > (tmp)); > > +mode = insn_data[icode].operand[0].mode; > > +rtx dest = mode == mode ? operands[0] : gen_reg_rtx > (mode); > > +emit_insn (GEN_FCN (icode) (dest, tmp)); > > +tmp = dest; > > + } > > +while (mode != mode); > > +DONE; > > + } > > +) > > + > > ;; 'across lanes' max and min ops. > > > > ;; Template for outputting a scalar, so we can create __builtins which > can be > > diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c > > b/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c > > new file mode 100644 > > index 000..0c4926d7ca8 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c > > @@ -0,0 +1,69 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -fno-vect-cost-model" } */ > > + > > +/* This function should produce cnt v.16b. */ > > +void > > +bar (unsigned char *__restrict b, unsigned char *__restrict d) > > +{ > > + for (int i = 0; i < 1024; i++) > > +d[i] = __builtin_popcount (b[i]); > > +} > > + > > +/* This function should produce cnt v.16b and uaddlp (Add Long > Pairwise). */ > > +void > > +bar1 (unsigned short *__restrict b, unsigned short *__restrict d) > > +{ > > + for (int i = 0; i < 1024; i++) > > +d[i] = __builtin_popcount (b[i]); > > +} > > + > > +/* This function should produce cnt v.16b and 2 uaddlp (Add Long > Pairwise). */ > > +void > > +bar2 (unsigned int *__restrict b, unsigned int *__restrict d)
[PATCH v1 1/7] RISC-V: Add testcases for unsigned .SAT_ADD vector form 2
From: Pan Li After the middle-end support the form 2 of unsigned SAT_ADD and the RISC-V backend implement the .SAT_ADD for vector mode, add more test case to cover the form 2. Form 2: #define DEF_VEC_SAT_U_ADD_FMT_2(T) \ void __attribute__((noinline)) \ vec_sat_u_add_##T##_fmt_2 (T *out, T *op_1, T *op_2, unsigned limit) \ {\ unsigned i;\ for (i = 0; i < limit; i++)\ {\ T x = op_1[i]; \ T y = op_2[i]; \ out[i] = (T)(x + y) >= x ? (x + y) : -1; \ }\ } Passed the rv64gcv regression tests. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper macro for testing. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-6.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-7.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-8.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-5.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-6.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-7.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-8.c: New test. Signed-off-by: Pan Li --- .../riscv/rvv/autovec/binop/vec_sat_arith.h | 16 .../riscv/rvv/autovec/binop/vec_sat_u_add-5.c | 19 + .../riscv/rvv/autovec/binop/vec_sat_u_add-6.c | 20 + .../riscv/rvv/autovec/binop/vec_sat_u_add-7.c | 20 + .../riscv/rvv/autovec/binop/vec_sat_u_add-8.c | 20 + .../rvv/autovec/binop/vec_sat_u_add-run-5.c | 75 +++ .../rvv/autovec/binop/vec_sat_u_add-run-6.c | 75 +++ .../rvv/autovec/binop/vec_sat_u_add-run-7.c | 75 +++ .../rvv/autovec/binop/vec_sat_u_add-run-8.c | 75 +++ 9 files changed, 395 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-6.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-7.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-8.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-6.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-7.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-8.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h index 450f0fbbc72..57b1bce4bd2 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h @@ -19,9 +19,25 @@ vec_sat_u_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned limit) \ }\ } +#define DEF_VEC_SAT_U_ADD_FMT_2(T) \ +void __attribute__((noinline)) \ +vec_sat_u_add_##T##_fmt_2 (T *out, T *op_1, T *op_2, unsigned limit) \ +{\ + unsigned i;\ + for (i = 0; i < limit; i++)\ +{\ + T x = op_1[i]; \ + T y = op_2[i]; \ + out[i] = (T)(x + y) >= x ? (x + y) : -1; \ +}\ +} + #define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \ vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N) +#define RUN_VEC_SAT_U_ADD_FMT_2(T, out, op_1, op_2, N) \ + vec_sat_u_add_##T##_fmt_2(out, op_1, op_2, N) + /**/ /* Saturation Sub (Unsigned and Signed) */ /**/ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c b/gcc/testsuite/gcc.target/risc
[PATCH v1 2/7] RISC-V: Add testcases for unsigned .SAT_ADD vector form 3
From: Pan Li After the middle-end support the form 3 of unsigned SAT_ADD and the RISC-V backend implement the .SAT_ADD for vector mode, add more test case to cover the form 3. Form 3: #define DEF_VEC_SAT_U_ADD_FMT_3(T) \ void __attribute__((noinline)) \ vec_sat_u_add_##T##_fmt_3 (T *out, T *op_1, T *op_2, unsigned limit) \ {\ unsigned i;\ for (i = 0; i < limit; i++)\ {\ T x = op_1[i]; \ T y = op_2[i]; \ T ret; \ T overflow = __builtin_add_overflow (x, y, &ret); \ out[i] = (T)(-overflow) | ret; \ }\ } Passed the rv64gcv regression tests. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper macro for testing. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-11.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-12.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-9.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c: New test. Signed-off-by: Pan Li --- .../riscv/rvv/autovec/binop/vec_sat_arith.h | 18 + .../rvv/autovec/binop/vec_sat_u_add-10.c | 20 + .../rvv/autovec/binop/vec_sat_u_add-11.c | 20 + .../rvv/autovec/binop/vec_sat_u_add-12.c | 20 + .../riscv/rvv/autovec/binop/vec_sat_u_add-9.c | 19 + .../rvv/autovec/binop/vec_sat_u_add-run-10.c | 75 +++ .../rvv/autovec/binop/vec_sat_u_add-run-11.c | 75 +++ .../rvv/autovec/binop/vec_sat_u_add-run-12.c | 75 +++ .../rvv/autovec/binop/vec_sat_u_add-run-9.c | 75 +++ 9 files changed, 397 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-11.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-12.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-9.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h index 57b1bce4bd2..76f393fffbd 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h @@ -32,12 +32,30 @@ vec_sat_u_add_##T##_fmt_2 (T *out, T *op_1, T *op_2, unsigned limit) \ }\ } +#define DEF_VEC_SAT_U_ADD_FMT_3(T) \ +void __attribute__((noinline)) \ +vec_sat_u_add_##T##_fmt_3 (T *out, T *op_1, T *op_2, unsigned limit) \ +{\ + unsigned i;\ + for (i = 0; i < limit; i++)\ +{\ + T x = op_1[i]; \ + T y = op_2[i]; \ + T ret; \ + T overflow = __builtin_add_overflow (x, y, &ret); \ + out[i] = (T)(-overflow) | ret; \ +}\ +} + #define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \ vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N) #define RUN_VEC_SAT_U_ADD_FMT_2(T, out, op_1, op_2, N) \ vec_sat_u_add_##T##_fmt_2(out, op_1, op_2, N) +#define RUN_VEC_SAT_U_ADD_FMT_3(T, out, op_1, op_2, N) \
[PATCH v1 3/7] RISC-V: Add testcases for unsigned .SAT_ADD vector form 4
From: Pan Li After the middle-end support the form 4 of unsigned SAT_ADD and the RISC-V backend implement the .SAT_ADD for vector mode, add more test case to cover the form 4. Form 4: #define DEF_VEC_SAT_U_ADD_FMT_4(T) \ void __attribute__((noinline)) \ vec_sat_u_add_##T##_fmt_4 (T *out, T *op_1, T *op_2, unsigned limit) \ {\ unsigned i;\ for (i = 0; i < limit; i++)\ {\ T x = op_1[i]; \ T y = op_2[i]; \ T ret; \ out[i] = __builtin_add_overflow (x, y, &ret) ? -1 : ret; \ }\ } gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper macro for testing. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-13.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-14.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-15.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-16.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-13.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-14.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-15.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-16.c: New test. Passed the rv64gcv regression tests. Signed-off-by: Pan Li --- .../riscv/rvv/autovec/binop/vec_sat_arith.h | 17 + .../rvv/autovec/binop/vec_sat_u_add-13.c | 19 + .../rvv/autovec/binop/vec_sat_u_add-14.c | 20 + .../rvv/autovec/binop/vec_sat_u_add-15.c | 20 + .../rvv/autovec/binop/vec_sat_u_add-16.c | 20 + .../rvv/autovec/binop/vec_sat_u_add-run-13.c | 75 +++ .../rvv/autovec/binop/vec_sat_u_add-run-14.c | 75 +++ .../rvv/autovec/binop/vec_sat_u_add-run-15.c | 75 +++ .../rvv/autovec/binop/vec_sat_u_add-run-16.c | 75 +++ 9 files changed, 396 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-13.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-14.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-15.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-16.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-13.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-14.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-15.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-16.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h index 76f393fffbd..e00769e35b6 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h @@ -47,6 +47,20 @@ vec_sat_u_add_##T##_fmt_3 (T *out, T *op_1, T *op_2, unsigned limit) \ }\ } +#define DEF_VEC_SAT_U_ADD_FMT_4(T) \ +void __attribute__((noinline)) \ +vec_sat_u_add_##T##_fmt_4 (T *out, T *op_1, T *op_2, unsigned limit) \ +{\ + unsigned i;\ + for (i = 0; i < limit; i++)\ +{\ + T x = op_1[i]; \ + T y = op_2[i]; \ + T ret; \ + out[i] = __builtin_add_overflow (x, y, &ret) ? -1 : ret; \ +}\ +} + #define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \ vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N) @@ -56,6 +70,9 @@ vec_sat_u_add_##T##_fmt_3 (T *out, T *op_1, T *op_2, unsigned limit) \ #define RUN_VEC_SAT_U_ADD_FMT_3(T, out, op_1, op_2, N) \ vec_sat_u_add_##T##_fmt_3(out, op_1, op_2, N) +#define RUN_VEC_SAT_U_ADD_FMT_4(T, out, op_1, op_2, N) \ + vec_sat_u_add_##T##_fmt_4(out, op_1, op_2, N) +
[PATCH v1 4/7] RISC-V: Add testcases for unsigned .SAT_ADD vector form 5
From: Pan Li After the middle-end support the form 5 of unsigned SAT_ADD and the RISC-V backend implement the .SAT_ADD for vector mode, add more test case to cover the form 5. Form 5: #define DEF_VEC_SAT_U_ADD_FMT_5(T) \ void __attribute__((noinline)) \ vec_sat_u_add_##T##_fmt_5 (T *out, T *op_1, T *op_2, unsigned limit) \ {\ unsigned i;\ for (i = 0; i < limit; i++)\ {\ T x = op_1[i]; \ T y = op_2[i]; \ T ret; \ out[i] = __builtin_add_overflow (x, y, &ret) == 0 ? ret : -1; \ }\ } Passed the rv64gcv regression tests. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper macro for testing. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-17.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-18.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-19.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-20.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-17.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-18.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-19.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-20.c: New test. Signed-off-by: Pan Li --- .../riscv/rvv/autovec/binop/vec_sat_arith.h | 17 + .../rvv/autovec/binop/vec_sat_u_add-17.c | 19 + .../rvv/autovec/binop/vec_sat_u_add-18.c | 20 + .../rvv/autovec/binop/vec_sat_u_add-19.c | 20 + .../rvv/autovec/binop/vec_sat_u_add-20.c | 20 + .../rvv/autovec/binop/vec_sat_u_add-run-17.c | 75 +++ .../rvv/autovec/binop/vec_sat_u_add-run-18.c | 75 +++ .../rvv/autovec/binop/vec_sat_u_add-run-19.c | 75 +++ .../rvv/autovec/binop/vec_sat_u_add-run-20.c | 75 +++ 9 files changed, 396 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-17.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-18.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-19.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-20.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-17.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-18.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-19.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-20.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h index e00769e35b6..1f2ee31577d 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h @@ -61,6 +61,20 @@ vec_sat_u_add_##T##_fmt_4 (T *out, T *op_1, T *op_2, unsigned limit) \ }\ } +#define DEF_VEC_SAT_U_ADD_FMT_5(T) \ +void __attribute__((noinline)) \ +vec_sat_u_add_##T##_fmt_5 (T *out, T *op_1, T *op_2, unsigned limit) \ +{\ + unsigned i;\ + for (i = 0; i < limit; i++)\ +{\ + T x = op_1[i]; \ + T y = op_2[i]; \ + T ret; \ + out[i] = __builtin_add_overflow (x, y, &ret) == 0 ? ret : -1; \ +}\ +} + #define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \ vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N) @@ -73,6 +87,9 @@ vec_sat_u_add_##T##_fmt_4 (T *out, T *op_1, T *op_2, unsigned limit) \ #define RUN_VEC_SAT_U_ADD_FMT_4(T, out, op_1, op_2, N) \ vec_sat_u_add_##T##_fmt_4(out, op_1, op_2, N) +#define RUN_VEC_SAT_U_ADD_FMT_5(T, out, op_1, op_2, N) \ + vec_sat_u_add_##T##_fmt_5(out, op_1, op_2, N) +
[PATCH v1 6/7] RISC-V: Add testcases for unsigned .SAT_ADD vector form 7
From: Pan Li After the middle-end support the form 7 of unsigned SAT_ADD and the RISC-V backend implement the .SAT_ADD for vector mode, add more test case to cover the form 7. Form 7: #define DEF_VEC_SAT_U_ADD_FMT_7(T) \ void __attribute__((noinline)) \ vec_sat_u_add_##T##_fmt_7 (T *out, T *op_1, T *op_2, unsigned limit) \ {\ unsigned i;\ for (i = 0; i < limit; i++)\ {\ T x = op_1[i]; \ T y = op_2[i]; \ out[i] = (T)(x + y) < x ? -1 : (x + y);\ }\ } Passed the rv64gcv regression tests. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper macro for testing. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-26.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-27.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-28.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-25.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-26.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-27.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-28.c: New test. Signed-off-by: Pan Li --- .../riscv/rvv/autovec/binop/vec_sat_arith.h | 16 .../rvv/autovec/binop/vec_sat_u_add-25.c | 19 + .../rvv/autovec/binop/vec_sat_u_add-26.c | 20 + .../rvv/autovec/binop/vec_sat_u_add-27.c | 20 + .../rvv/autovec/binop/vec_sat_u_add-28.c | 20 + .../rvv/autovec/binop/vec_sat_u_add-run-25.c | 75 +++ .../rvv/autovec/binop/vec_sat_u_add-run-26.c | 75 +++ .../rvv/autovec/binop/vec_sat_u_add-run-27.c | 75 +++ .../rvv/autovec/binop/vec_sat_u_add-run-28.c | 75 +++ 9 files changed, 395 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-26.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-27.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-28.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-25.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-26.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-27.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-28.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h index 0f08822cbeb..46fae4555be 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h @@ -88,6 +88,19 @@ vec_sat_u_add_##T##_fmt_6 (T *out, T *op_1, T *op_2, unsigned limit) \ }\ } +#define DEF_VEC_SAT_U_ADD_FMT_7(T) \ +void __attribute__((noinline)) \ +vec_sat_u_add_##T##_fmt_7 (T *out, T *op_1, T *op_2, unsigned limit) \ +{\ + unsigned i;\ + for (i = 0; i < limit; i++)\ +{\ + T x = op_1[i]; \ + T y = op_2[i]; \ + out[i] = (T)(x + y) < x ? -1 : (x + y);\ +}\ +} + #define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \ vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N) @@ -106,6 +119,9 @@ vec_sat_u_add_##T##_fmt_6 (T *out, T *op_1, T *op_2, unsigned limit) \ #define RUN_VEC_SAT_U_ADD_FMT_6(T, out, op_1, op_2, N) \ vec_sat_u_add_##T##_fmt_6(out, op_1, op_2, N) +#define RUN_VEC_SAT_U_ADD_FMT_7(T, out, op_1, op_2, N) \ + vec_sat_u_add_##T##_fmt_7(out, op_1, op_2, N) + /**/ /* Saturation Sub (Unsigned and Signed)
[PATCH v1 5/7] RISC-V: Add testcases for unsigned .SAT_ADD vector form 6
From: Pan Li After the middle-end support the form 6 of unsigned SAT_ADD and the RISC-V backend implement the .SAT_ADD for vector mode, add more test case to cover the form 6. Form 6: #define DEF_VEC_SAT_U_ADD_FMT_6(T) \ void __attribute__((noinline)) \ vec_sat_u_add_##T##_fmt_6 (T *out, T *op_1, T *op_2, unsigned limit) \ {\ unsigned i;\ for (i = 0; i < limit; i++)\ {\ T x = op_1[i]; \ T y = op_2[i]; \ out[i] = x <= (T)(x + y) ? (x + y) : -1; \ }\ } Passed the rv64gcv regression tests. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper macro for testing. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-21.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-22.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-23.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-24.c: New test. Signed-off-by: Pan Li --- .../riscv/rvv/autovec/binop/vec_sat_arith.h | 16 .../rvv/autovec/binop/vec_sat_u_add-21.c | 19 + .../rvv/autovec/binop/vec_sat_u_add-22.c | 20 + .../rvv/autovec/binop/vec_sat_u_add-23.c | 20 + .../rvv/autovec/binop/vec_sat_u_add-24.c | 20 + .../rvv/autovec/binop/vec_sat_u_add-run-21.c | 75 +++ .../rvv/autovec/binop/vec_sat_u_add-run-22.c | 75 +++ .../rvv/autovec/binop/vec_sat_u_add-run-23.c | 75 +++ .../rvv/autovec/binop/vec_sat_u_add-run-24.c | 75 +++ 9 files changed, 395 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-21.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-22.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-23.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-24.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h index 1f2ee31577d..0f08822cbeb 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h @@ -75,6 +75,19 @@ vec_sat_u_add_##T##_fmt_5 (T *out, T *op_1, T *op_2, unsigned limit) \ }\ } +#define DEF_VEC_SAT_U_ADD_FMT_6(T) \ +void __attribute__((noinline)) \ +vec_sat_u_add_##T##_fmt_6 (T *out, T *op_1, T *op_2, unsigned limit) \ +{\ + unsigned i;\ + for (i = 0; i < limit; i++)\ +{\ + T x = op_1[i]; \ + T y = op_2[i]; \ + out[i] = x <= (T)(x + y) ? (x + y) : -1; \ +}\ +} + #define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \ vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N) @@ -90,6 +103,9 @@ vec_sat_u_add_##T##_fmt_5 (T *out, T *op_1, T *op_2, unsigned limit) \ #define RUN_VEC_SAT_U_ADD_FMT_5(T, out, op_1, op_2, N) \ vec_sat_u_add_##T##_fmt_5(out, op_1, op_2, N) +#define RUN_VEC_SAT_U_ADD_FMT_6(T, out, op_1, op_2, N) \ + vec_sat_u_add_##T##_fmt_6(out, op_1, op_2, N) + /**/ /* Saturation Sub (Unsigned and Signed)
[PATCH v1 7/7] RISC-V: Add testcases for unsigned .SAT_ADD vector form 8
From: Pan Li After the middle-end support the form 8 of unsigned SAT_ADD and the RISC-V backend implement the .SAT_ADD for vector mode, add more test case to cover the form 8. Form 8: #define DEF_VEC_SAT_U_ADD_FMT_8(T) \ void __attribute__((noinline)) \ vec_sat_u_add_##T##_fmt_8 (T *out, T *op_1, T *op_2, unsigned limit) \ {\ unsigned i;\ for (i = 0; i < limit; i++)\ {\ T x = op_1[i]; \ T y = op_2[i]; \ out[i] = x > (T)(x + y) ? -1 : (x + y);\ }\ } Passed the rv64gcv regression tests. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper macro for testing. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-29.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-30.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-31.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-32.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-29.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-30.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-31.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-32.c: New test. Signed-off-by: Pan Li --- .../riscv/rvv/autovec/binop/vec_sat_arith.h | 16 .../rvv/autovec/binop/vec_sat_u_add-29.c | 19 + .../rvv/autovec/binop/vec_sat_u_add-30.c | 20 + .../rvv/autovec/binop/vec_sat_u_add-31.c | 20 + .../rvv/autovec/binop/vec_sat_u_add-32.c | 20 + .../rvv/autovec/binop/vec_sat_u_add-run-29.c | 75 +++ .../rvv/autovec/binop/vec_sat_u_add-run-30.c | 75 +++ .../rvv/autovec/binop/vec_sat_u_add-run-31.c | 75 +++ .../rvv/autovec/binop/vec_sat_u_add-run-32.c | 75 +++ 9 files changed, 395 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-29.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-30.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-31.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-32.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-29.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-30.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-31.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-32.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h index 46fae4555be..443f88261ba 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h @@ -101,6 +101,19 @@ vec_sat_u_add_##T##_fmt_7 (T *out, T *op_1, T *op_2, unsigned limit) \ }\ } +#define DEF_VEC_SAT_U_ADD_FMT_8(T) \ +void __attribute__((noinline)) \ +vec_sat_u_add_##T##_fmt_8 (T *out, T *op_1, T *op_2, unsigned limit) \ +{\ + unsigned i;\ + for (i = 0; i < limit; i++)\ +{\ + T x = op_1[i]; \ + T y = op_2[i]; \ + out[i] = x > (T)(x + y) ? -1 : (x + y);\ +}\ +} + #define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \ vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N) @@ -122,6 +135,9 @@ vec_sat_u_add_##T##_fmt_7 (T *out, T *op_1, T *op_2, unsigned limit) \ #define RUN_VEC_SAT_U_ADD_FMT_7(T, out, op_1, op_2, N) \ vec_sat_u_add_##T##_fmt_7(out, op_1, op_2, N) +#define RUN_VEC_SAT_U_ADD_FMT_8(T, out, op_1, op_2, N) \ + vec_sat_u_add_##T##_fmt_8(out, op_1, op_2, N) + /**/ /* Saturation Sub (Unsigned and Signed)
Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab
on 2024/6/17 14:16, Richard Biener wrote: > On Mon, 17 Jun 2024, Kewen.Lin wrote: > >> Hi Richi, >> >> on 2024/6/14 18:31, Richard Biener wrote: >>> The following retires vcond{,u,eq} optabs by stopping to use them >>> from the middle-end. Targets instead (should) implement vcond_mask >>> and vec_cmp{,u,eq} optabs. The PR this change refers to lists >>> possibly affected targets - those implementing these patterns, >>> and in particular it lists mips, sparc and ia64 as targets that >>> most definitely will regress while others might simply remove >>> their vcond{,u,eq} patterns. >>> >>> I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64. >>> I know riscv doesn't implement any of the legacy optabs. But less >>> maintained vector targets might need adjustments. >> >> Thanks for making this change, this patch can be bootstrapped on ppc64{,le} >> but both have one failure on gcc/testsuite/gcc.target/powerpc/pr66144-3.c, >> by looking into it, I found it just exposed one oversight in the current >> rs6000 vcond_mask support (the condition mask location is wrong), so I think >> this change is fine for rs6000 port, I'll also test SPEC2017 for this (with >> rs6000 vcond_mask change) soon. > > Btw, for those targets where the patch works out fine it would be nice > to delete their vcond{,u,eq} expanders (and double-check that doesn't > cause issues on its own). OK, will do, thanks for reminding! > > Can target maintainers note whether their targets support all condition > codes for their vector comparisons (including FP variants)? And On Power, hardware only supports EQ and GT for vector INT (well ISA 3.0 supports NE for b/h/w), while EQ, GT & GE for vector FP. But vec_cmp optab supports {EQ,NE,LT,LE,GT,GE} for signed, {EQ,NE,LTU,LEU,GTU,GEU} for unsigned, and {EQ,NE,LT,LE,GT,GE,UNORDERED,ORDERED,UNEQ,LTGT,UNGE,UNGT,UNLT,UNLE} for fp. > whether they choose to implement all condition codes in vec_cmp > and adjust with inversion / operand swapping for not supported cases? Yes for rs6000 port, some relies on define_insn_and_split. BR, Kewen
Re: [PATCH] rs6000: Compute rop_hash_save_offset for non-Altivec compiles [PR115389]
on 2024/6/17 10:31, Peter Bergner wrote: > On 6/16/24 9:10 PM, Kewen.Lin wrote: >> on 2024/6/15 01:05, Peter Bergner wrote: >>> That said, the --with-cpu=power5 build without fortran did bootstrap and >>> regtest with no regressions, so the build did test that code path and >>> exposed no problems. >> >> OK, nice! Thanks! > > I assume this means you're "OK" with the updated patch, correct? Yes, OK for trunk, thanks! >>> Currently, TARGET_ALTIVEC_ABI is defined as: >>> >>> #define TARGET_ALTIVEC_ABI rs6000_altivec_abi >>> >>> Would it make sense to redine it to: >>> >>> #define TARGET_ALTIVEC_ABI (TARGET_ALTIVEC && rs6000_altivec_abi) >>> >>> ...or add some code in rs6000 option handling to disable rs6000_altivec_abi >>> when TARGET_ALTIVEC is false? or do we care enough to even change it? >>> :-) >> >> Assuming the current code is robust enough (perfectly guarded by some >> altivec related >> condition like this altivec register saving slot), there may not any actual >> errors, >> but considering not surprising people, I'm inclined to add some option >> handlings for >> it, like unsetting rs6000_altivec_abi if !TARGET_ALTIVEC and give some >> warning if it's >> explicitly specified, what do you think? > > I like it, since if Altivec is disabled, having TARGET_ALTIVEC_ABI enabled > makes no > sense to me. That is orthogonal to this bug though, so should be a separate > patch. Yes. > Do you want to take a stab at writing that or do you want me to do that? Either is fine for me, then let me give it a shot. BR, Kewen
Ping^2 [PATCH-1v3, rs6000] Implement optab_isinf for SFDF and IEEE128
Hi, Gently ping the series of patches. [PATCH-1v3, rs6000] Implement optab_isinf for SFDF and IEEE128 https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652593.html [PATCH-2v3, rs6000] Implement optab_isfinite for SFDF and IEEE128 https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652594.html [PATCH-3v3, rs6000] Implement optab_isnormal for SFDF and IEEE128 https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652595.html Thanks Gui Haochen 在 2024/6/3 10:40, HAO CHEN GUI 写道: > Hi, > Gently ping the series of patches. > [PATCH-1v3, rs6000] Implement optab_isinf for SFDF and IEEE128 > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652593.html > [PATCH-2v3, rs6000] Implement optab_isfinite for SFDF and IEEE128 > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652594.html > [PATCH-3v3, rs6000] Implement optab_isnormal for SFDF and IEEE128 > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652595.html > > Thanks > Gui Haochen > > 在 2024/5/24 14:02, HAO CHEN GUI 写道: >> Hi, >> This patch implemented optab_isinf for SFDF and IEEE128 by test >> data class instructions. >> >> Compared with previous version, the main change is to narrow >> down the predict for float operand according to review's advice. >> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652128.html >> >> Bootstrapped and tested on powerpc64-linux BE and LE with no >> regressions. Is it OK for trunk? >> >> Thanks >> Gui Haochen >> >> ChangeLog >> rs6000: Implement optab_isinf for SFDF and IEEE128 >> >> gcc/ >> PR target/97786 >> * config/rs6000/vsx.md (isinf2 for SFDF): New expand. >> (isinf2 for IEEE128): New expand. >> >> gcc/testsuite/ >> PR target/97786 >> * gcc.target/powerpc/pr97786-1.c: New test. >> * gcc.target/powerpc/pr97786-2.c: New test. >> >> patch.diff >> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md >> index f135fa079bd..08cce11da60 100644 >> --- a/gcc/config/rs6000/vsx.md >> +++ b/gcc/config/rs6000/vsx.md >> @@ -5313,6 +5313,24 @@ (define_expand "xststdcp" >>operands[4] = CONST0_RTX (SImode); >> }) >> >> +(define_expand "isinf2" >> + [(use (match_operand:SI 0 "gpc_reg_operand")) >> + (use (match_operand:SFDF 1 "vsx_register_operand"))] >> + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" >> +{ >> + emit_insn (gen_xststdcp (operands[0], operands[1], GEN_INT (0x30))); >> + DONE; >> +}) >> + >> +(define_expand "isinf2" >> + [(use (match_operand:SI 0 "gpc_reg_operand")) >> + (use (match_operand:IEEE128 1 "vsx_register_operand"))] >> + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" >> +{ >> + emit_insn (gen_xststdcqp_ (operands[0], operands[1], GEN_INT >> (0x30))); >> + DONE; >> +}) >> + >> ;; The VSX Scalar Test Negative Quad-Precision >> (define_expand "xststdcnegqp_" >>[(set (match_dup 2) >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-1.c >> b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c >> new file mode 100644 >> index 000..c1c4f64ee8b >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c >> @@ -0,0 +1,22 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target powerpc_vsx } */ >> +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ >> + >> +int test1 (double x) >> +{ >> + return __builtin_isinf (x); >> +} >> + >> +int test2 (float x) >> +{ >> + return __builtin_isinf (x); >> +} >> + >> +int test3 (float x) >> +{ >> + return __builtin_isinff (x); >> +} >> + >> +/* { dg-final { scan-assembler-not {\mfcmp} } } */ >> +/* { dg-final { scan-assembler-times {\mxststdcsp\M} 2 } } */ >> +/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */ >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-2.c >> b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c >> new file mode 100644 >> index 000..ed305e8572e >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c >> @@ -0,0 +1,17 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target ppc_float128_hw } */ >> +/* { dg-require-effective-target powerpc_vsx } */ >> +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" >> } */ >> + >> +int test1 (long double x) >> +{ >> + return __builtin_isinf (x); >> +} >> + >> +int test2 (long double x) >> +{ >> + return __builtin_isinfl (x); >> +} >> + >> +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */ >> +/* { dg-final { scan-assembler-times {\mxststdcqp\M} 2 } } */
Ping^2 [PATCHv5] Optab: add isfinite_optab for __builtin_isfinite
Hi, Gently ping it. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652991.html Thanks Gui Haochen 在 2024/6/3 10:37, HAO CHEN GUI 写道: > Hi, > All issues were addressed. Gently ping it. > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652991.html > > Thanks > Gui Haochen > > 在 2024/5/29 14:36, HAO CHEN GUI 写道: >> Hi, >> This patch adds an optab for __builtin_isfinite. The finite check can be >> implemented on rs6000 by a single instruction. It needs an optab to be >> expanded to the certain sequence of instructions. >> >> The subsequent patches will implement the expand on rs6000. >> >> Compared to previous version, the main change is to specify return >> value of the optab should be either 0 or 1. >> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652864.html >> >> Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no >> regressions. Is this OK for trunk? >> >> Thanks >> Gui Haochen >> >> ChangeLog >> optab: Add isfinite_optab for isfinite builtin >> >> gcc/ >> * builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab >> for isfinite builtin. >> * optabs.def (isfinite_optab): New. >> * doc/md.texi (isfinite): Document. >> >> >> patch.diff >> diff --git a/gcc/builtins.cc b/gcc/builtins.cc >> index f8d94c4b435..53e9d210541 100644 >> --- a/gcc/builtins.cc >> +++ b/gcc/builtins.cc >> @@ -2459,8 +2459,10 @@ interclass_mathfn_icode (tree arg, tree fndecl) >>errno_set = true; builtin_optab = ilogb_optab; break; >> CASE_FLT_FN (BUILT_IN_ISINF): >>builtin_optab = isinf_optab; break; >> -case BUILT_IN_ISNORMAL: >> case BUILT_IN_ISFINITE: >> + builtin_optab = isfinite_optab; >> + break; >> +case BUILT_IN_ISNORMAL: >> CASE_FLT_FN (BUILT_IN_FINITE): >> case BUILT_IN_FINITED32: >> case BUILT_IN_FINITED64: >> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi >> index 5730bda80dc..3eb4216141e 100644 >> --- a/gcc/doc/md.texi >> +++ b/gcc/doc/md.texi >> @@ -8557,6 +8557,12 @@ operand 2, greater than operand 2 or is unordered >> with operand 2. >> >> This pattern is not allowed to @code{FAIL}. >> >> +@cindex @code{isfinite@var{m}2} instruction pattern >> +@item @samp{isfinite@var{m}2} >> +Return 1 if operand 1 is a finite floating point number and 0 >> +otherwise. @var{m} is a scalar floating point mode. Operand 0 >> +has mode @code{SImode}, and operand 1 has mode @var{m}. >> + >> @end table >> >> @end ifset >> diff --git a/gcc/optabs.def b/gcc/optabs.def >> index ad14f9328b9..dcd77315c2a 100644 >> --- a/gcc/optabs.def >> +++ b/gcc/optabs.def >> @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3") >> OPTAB_D (hypot_optab, "hypot$a3") >> OPTAB_D (ilogb_optab, "ilogb$a2") >> OPTAB_D (isinf_optab, "isinf$a2") >> +OPTAB_D (isfinite_optab, "isfinite$a2") >> OPTAB_D (issignaling_optab, "issignaling$a2") >> OPTAB_D (ldexp_optab, "ldexp$a3") >> OPTAB_D (log10_optab, "log10$a2")
Re: Patch ping
On Mon, Jun 17, 2024 at 03:26:52PM +0200, Jakub Jelinek wrote: > I'd like to ping the > https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653573.html > patch. While the committed and backported patch fixed PCH on PIE > cc1/cc1plus etc. on PowerPC, it grew up the size of the > rs6000_init_generated_builtins function quite a lot. > The above patch decreases it back, to even less than the size of > the function before my fix. A patch in the middle of a thread. I missed it, sorry. Please send patches as separate threads? Segher
[PATCH] diagnostics: Fix add_misspelling_candidates [PR115440]
Hi! The option_map array for most entries contains just non-NULL opt0 { "-Wno-", NULL, "-W", false, true }, { "-fno-", NULL, "-f", false, true }, { "-gno-", NULL, "-g", false, true }, { "-mno-", NULL, "-m", false, true }, { "--debug=", NULL, "-g", false, false }, { "--machine-", NULL, "-m", true, false }, { "--machine-no-", NULL, "-m", false, true }, { "--machine=", NULL, "-m", false, false }, { "--machine=no-", NULL, "-m", false, true }, { "--machine", "", "-m", false, false }, { "--machine", "no-", "-m", false, true }, { "--optimize=", NULL, "-O", false, false }, { "--std=", NULL, "-std=", false, false }, { "--std", "", "-std=", false, false }, { "--warn-", NULL, "-W", true, false }, { "--warn-no-", NULL, "-W", false, true }, { "--", NULL, "-f", true, false }, { "--no-", NULL, "-f", false, true } and so add_misspelling_candidates works correctly for it, but 3 out of these, { "--machine", "", "-m", false, false }, { "--machine", "no-", "-m", false, true }, and { "--std", "", "-std=", false, false }, use non-NULL opt1. That says that --machine foo should map to -mfoo and --machine no-foo should map to -mno-foo and --std c++17 should map to -std=c++17 add_misspelling_canidates was not handling this, so it hapilly registered say --stdc++17 or --machineavx512 (twice) as spelling alternatives, when those options aren't recognized. Instead we support --std c++17 or --machine avx512 --machine no-avx512 The following patch fixes that. On this particular testcase, we no longer suggest anything, even when among the suggestion is say that --std c++17 or -std=c++17 etc. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2024-06-17 Jakub Jelinek PR driver/115440 * opts-common.cc (add_misspelling_candidates): If opt1 is non-NULL, add a space and opt1 to the alternative suggestion text. * g++.dg/cpp1z/pr115440.C: New test. --- gcc/opts-common.cc.jj 2024-06-14 19:44:34.434236887 +0200 +++ gcc/opts-common.cc 2024-06-17 10:58:14.351178400 +0200 @@ -524,6 +524,7 @@ add_misspelling_candidates (auto_vecsafe_push (alternative); } } --- gcc/testsuite/g++.dg/cpp1z/pr115440.C.jj2024-06-17 10:55:28.607380969 +0200 +++ gcc/testsuite/g++.dg/cpp1z/pr115440.C 2024-06-17 11:04:38.334075632 +0200 @@ -0,0 +1,8 @@ +// PR driver/115440 +// { dg-do compile { target c++17_only } } +// { dg-options "--c++17" } + +int i; + +// { dg-bogus "unrecognized command-line option '--c\\\+\\\+17'; did you mean '--stdc\\\+\\\+17'" "" { target *-*-* } 0 } +// { dg-error "unrecognized command-line option '--c\\\+\\\+17'" "" { target *-*-* } 0 } Jakub
[Committed] RISC-V: Add configure check for Zaamo/Zalrsc assembler support
Binutils 2.42 and before don't support Zaamo/Zalrsc. Add a configure check to prevent emitting Zaamo/Zalrsc in the arch string when the assember does not support it. gcc/ChangeLog: * common/config/riscv/riscv-common.cc (riscv_subset_list::to_string): Skip zaamo/zalrsc when not supported by the assembler. * config.in: Regenerate. * configure: Regenerate. * configure.ac: Add zaamo/zalrsc assmeber check. Signed-off-by: Patrick O'Neill Acked-by: Palmer Dabbelt # RISC-V Reviewed-by: Palmer Dabbelt # RISC-V --- Tested using newlib rv64gc with binutils tip-of-tree and 2.42. This results in calls being emitted when compiling for _zaamo_zalrsc when the assember does not support these extensions. > cat amo.c void foo (int* bar, int* baz) { __atomic_add_fetch(bar, baz, __ATOMIC_RELAXED); } > gcc -march=rv64id_zaamo_zalrsc -O3 amo.c results in: foo: sext.w a1,a1 li a2,0 tail__atomic_fetch_add_4 As a result there are some testsuite failures on zalrsc specific testcases and when using an old version of binutils on non-a targets. Not a cause for concern imo but worth calling out. Also testcases that check for the default isa string will fail with the old binutils since zaamo/zalrsc aren't emitted anymore. --- gcc/common/config/riscv/riscv-common.cc | 11 + gcc/config.in | 6 + gcc/configure | 31 + gcc/configure.ac| 5 4 files changed, 53 insertions(+) diff --git a/gcc/common/config/riscv/riscv-common.cc b/gcc/common/config/riscv/riscv-common.cc index 78dfd6b1470..1dc1d9904c7 100644 --- a/gcc/common/config/riscv/riscv-common.cc +++ b/gcc/common/config/riscv/riscv-common.cc @@ -916,6 +916,7 @@ riscv_subset_list::to_string (bool version_p) const riscv_subset_t *subset; bool skip_zifencei = false; + bool skip_zaamo_zalrsc = false; bool skip_zicsr = false; bool i2p0 = false; @@ -943,6 +944,10 @@ riscv_subset_list::to_string (bool version_p) const a mistake in that binutils 2.35 supports zicsr but not zifencei. */ skip_zifencei = true; #endif +#ifndef HAVE_AS_MARCH_ZAAMO_ZALRSC + /* Skip since binutils 2.42 and earlier don't recognize zaamo/zalrsc. */ + skip_zaamo_zalrsc = true; +#endif for (subset = m_head; subset != NULL; subset = subset->next) { @@ -954,6 +959,12 @@ riscv_subset_list::to_string (bool version_p) const subset->name == "zicsr") continue; + if (skip_zaamo_zalrsc && subset->name == "zaamo") + continue; + + if (skip_zaamo_zalrsc && subset->name == "zalrsc") + continue; + /* For !version_p, we only separate extension with underline for multi-letter extension. */ if (!first && diff --git a/gcc/config.in b/gcc/config.in index e41b6dc97cd..acab3c0f126 100644 --- a/gcc/config.in +++ b/gcc/config.in @@ -629,6 +629,12 @@ #endif +/* Define if the assembler understands -march=rv*_zaamo_zalrsc. */ +#ifndef USED_FOR_TARGET +#undef HAVE_AS_MARCH_ZAAMO_ZALRSC +#endif + + /* Define if the assembler understands -march=rv*_zifencei. */ #ifndef USED_FOR_TARGET #undef HAVE_AS_MARCH_ZIFENCEI diff --git a/gcc/configure b/gcc/configure index 94970e24051..9dc0b65dfaa 100755 --- a/gcc/configure +++ b/gcc/configure @@ -30820,6 +30820,37 @@ if test $gcc_cv_as_riscv_march_zifencei = yes; then $as_echo "#define HAVE_AS_MARCH_ZIFENCEI 1" >>confdefs.h +fi + +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for -march=rv32i_zaamo_zalrsc support" >&5 +$as_echo_n "checking assembler for -march=rv32i_zaamo_zalrsc support... " >&6; } +if ${gcc_cv_as_riscv_march_zaamo_zalrsc+:} false; then : + $as_echo_n "(cached) " >&6 +else + gcc_cv_as_riscv_march_zaamo_zalrsc=no + if test x$gcc_cv_as != x; then +$as_echo '' > conftest.s +if { ac_try='$gcc_cv_as $gcc_cv_as_flags -march=rv32i_zaamo_zalrsc -o conftest.o conftest.s >&5' + { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5 + (eval $ac_try) 2>&5 + ac_status=$? + $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; }; } +then + gcc_cv_as_riscv_march_zaamo_zalrsc=yes +else + echo "configure: failed program was" >&5 + cat conftest.s >&5 +fi +rm -f conftest.o conftest.s + fi +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_riscv_march_zaamo_zalrsc" >&5 +$as_echo "$gcc_cv_as_riscv_march_zaamo_zalrsc" >&6; } +if test $gcc_cv_as_riscv_march_zaamo_zalrsc = yes; then + +$as_echo "#define HAVE_AS_MARCH_ZAAMO_ZALRSC 1" >>confdefs.h + fi ;; diff --git a/gcc/configure.ac b/gcc/configure.ac index 35475cf5aae..b2243e9954a 100644 --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -5452,6 +5452,11 @@ configured with --enable-newlib-nano-formatted-io.]) [-march=rv32i_zifencei2p0],,, [AC_DEFINE(HAVE_AS_MARCH_ZIFENCEI, 1, [
Re: [PATCH] RISC-V: Add configure check for Zaamo/Zalrsc assembler support
On 6/13/24 13:02, Jeff Law wrote: On 6/12/24 5:20 PM, Patrick O'Neill wrote: Binutils 2.42 and before don't support Zaamo/Zalrsc. Add a configure check to prevent emitting Zaamo/Zalrsc in the arch string when the assember does not support it. gcc/ChangeLog: * common/config/riscv/riscv-common.cc (riscv_subset_list::to_string): Skip zaamo/zalrsc when not supported by the assembler. * config.in: Regenerate. * configure: Regenerate. * configure.ac: Add zaamo/zalrsc assmeber check. OK. It looks like you've got some unexpected diff fragmets in configure -- all the LARGE_OFF_T stuff. They look OK to me, but something like that is usually a sign of different autoconf versions. I wouldn't lose any sleep if you left them as-is or removed those hunks before committing. jeff Removed the hunks and committed. Sent the committed version to the list for the archiver. I'll rebase the promotion RFC [1] on top and resolve the warning that Andreas Schwab noticed. Patrick [1]: https://patchwork.sourceware.org/project/gcc/patch/20240613233059.1451117-1-patr...@rivosinc.com/
Re: [PATCH] rs6000: Compute rop_hash_save_offset for non-Altivec compiles [PR115389]
on 2024/6/15 01:05, Peter Bergner wrote: > On 6/13/24 10:26 PM, Peter Bergner wrote: >> On 6/13/24 9:26 PM, Kewen.Lin wrote: > I understand this is just copied from the if arm, but if I read this > right, it can be > simplified as: Ok, I'll retest with that simplification. >> >> So I retested a normal powerpc64le-linux build (ie, we default to Power8 >> with Altivec) and it bootstrapped and regtested with no regressions. >> I then attempted a --with-cpu=power5 build to test the non-altivec path, >> but both the unpatched and patched builds died building libgfortran with >> the following error: "error: ‘_Float128’ is not supported on this target". >> I believe that is related to PR113652. I'll kick off the build again, >> this time disabling Fortran and seeing if the build completes. > > My bad for calling the --with-cpu=power5 bootstrap build on ELFv2 a "bug". > It's not, since ELFv2 mandates a cpu with at least ISA 2.07 (eg. Power8) > support and some of the libgfortran code was written assuming that, so what > I was trying to do was really not supported (ie, luser error). > > That said, the --with-cpu=power5 build without fortran did bootstrap and > regtest with no regressions, so the build did test that code path and > exposed no problems. OK, nice! Thanks! > > > That's what I expected too! :-) However, I was surprised to learn that -mno-altivec does *not* disable TARGET_ALTIVEC_ABI. I had to explicitly use the -mabi= option to expose the bug. >>> >>> oh, it's surprising, I learn something today! :) I guess it's not >>> intentional but just no >>> one noticed it, as it seems nonsense to have altivec ABI extension but not >>> using any altivec >>> features. > > Currently, TARGET_ALTIVEC_ABI is defined as: > > #define TARGET_ALTIVEC_ABI rs6000_altivec_abi > > Would it make sense to redine it to: > > #define TARGET_ALTIVEC_ABI (TARGET_ALTIVEC && rs6000_altivec_abi) > > ...or add some code in rs6000 option handling to disable rs6000_altivec_abi > when TARGET_ALTIVEC is false? or do we care enough to even change it? :-) Assuming the current code is robust enough (perfectly guarded by some altivec related condition like this altivec register saving slot), there may not any actual errors, but considering not surprising people, I'm inclined to add some option handlings for it, like unsetting rs6000_altivec_abi if !TARGET_ALTIVEC and give some warning if it's explicitly specified, what do you think? BR, Kewen
[committed] c++: Fix up floating point conversion rank comparison for _Float32 and float if float/double are same size [PR115511]
Hi! On AVR and SH with some options sizeof (float) == sizeof (double) and the 2 types have the same set of values. http://eel.is/c++draft/conv.rank#2.2 for this says that double still has bigger rank than float and http://eel.is/c++draft/conv.rank#2.2 says that extended type with the same set of values as more than one standard floating point type shall have the same rank as double. I've implemented the latter rule as if (cnt > 1 && mv2 == long_double_type_node) return -2; with the _Float64/double/long double case having same mode case (various targets with -mlong-double-64) in mind. But never thought there are actually targets where float and double are the same, that needs handling too, if cnt > 1 (that is the extended type mv1 has same set of values as 2 or 3 of float/double/long double) and mv2 is float, we need to return 2, because mv1 in that case should have same rank as double and double has bigger rank than float. Bootstrapped/regtested on x86_64-linux and i686-linux and checked with a cross-compiler to avr-none on the testcase, which previously ICEd because the function returned _Float32 and float have the same rank, just different subrank and for _Float32 vs. double also returned they have the same rank. Committed to trunk as obvious, will backport to 14/13 soon. 2024-06-17 Jakub Jelinek PR target/111343 PR c++/115511 * typeck.cc (cp_compare_floating_point_conversion_ranks): If an extended floating point type mv1 has same set of values as more than one standard floating point type and mv2 is float, return 2. * g++.dg/cpp23/ext-floating18.C: New test. --- gcc/cp/typeck.cc.jj 2024-06-04 13:19:03.755604346 +0200 +++ gcc/cp/typeck.cc2024-06-17 10:32:02.063088961 +0200 @@ -393,6 +393,9 @@ cp_compare_floating_point_conversion_ran has higher rank. */ if (cnt > 1 && mv2 == long_double_type_node) return -2; + /* And similarly if t2 is float, t2 has lower rank. */ + if (cnt > 1 && mv2 == float_type_node) +return 2; /* Otherwise, they have equal rank, but extended types (other than std::bfloat16_t) have higher subrank. std::bfloat16_t shouldn't have equal rank to any standard --- gcc/testsuite/g++.dg/cpp23/ext-floating18.C.jj 2024-06-17 18:39:01.740020581 +0200 +++ gcc/testsuite/g++.dg/cpp23/ext-floating18.C 2024-06-17 18:47:19.152779782 +0200 @@ -0,0 +1,26 @@ +// P1467R9 - Extended floating-point types and standard names. +// { dg-do compile { target c++23 } } +// { dg-options "" } +// { dg-add-options float32 } + +constexpr int foo (float) { return 1; } +constexpr int foo (double) { return 2; } +constexpr int foo (long double) { return 3; } + +#ifdef __STDCPP_FLOAT32_T__ +#if __FLT_MAX_EXP__ == __FLT32_MAX_EXP__ \ +&& __FLT_MAX_DIG__ == __FLT32_MAX_DIG__ +#if __FLT_MAX_EXP__ == __DBL_MAX_EXP__ \ +&& __FLT_MAX_DIG__ == __DBL_MAX_DIG__ +static_assert (foo (1.0f32) == 2); +#else +static_assert (foo (1.0f32) == 1); +#endif +#endif +#endif +#ifdef __STDCPP_FLOAT64_T__ +#if __DBL_MAX_EXP__ == __FLT64_MAX_EXP__ \ +&& __DBL_MAX_DIG__ == __FLT64_MAX_DIG__ +static_assert (foo (1.0f64) == 2); +#endif +#endif Jakub
[PATCH] c-family: Fix -Warray-compare warning ICE [PR115290]
Hi! The warning code uses %D to print the ARRAY_REF first operands. That works in the most common case where those operands are decls, but as can be seen on the following testcase, they can be other expressions with array type. Just changing %D to %E isn't enough, because then the diagnostics can suggest something like note: use '&(x) != 0 ? (int (*)[32])&a : (int (*)[32])&b[0] == &(y) != 0 ? (int (*)[32])&a : (int (*)[32])&b[0]' to compare the addresses which is a bad suggestion, the %E printing doesn't know that the warning code will want to add & before it and [0] after it. So, the following patch adds ()s around the operand as well, but does that only for non-decls, for decls keeps it as &arr[0] like before. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and release branches? 2024-06-17 Jakub Jelinek PR c/115290 * c-warn.cc (do_warn_array_compare): Use %E rather than %D for printing op0 and op1; if those operands aren't decls, also print parens around them. * c-c++-common/Warray-compare-3.c: New test. --- gcc/c-family/c-warn.cc.jj 2024-06-04 13:19:03.371609456 +0200 +++ gcc/c-family/c-warn.cc 2024-06-17 15:07:09.005737065 +0200 @@ -3832,11 +3832,16 @@ do_warn_array_compare (location_t locati /* C doesn't allow +arr. */ if (c_dialect_cxx ()) inform (location, "use unary %<+%> which decays operands to pointers " - "or %<&%D[0] %s &%D[0]%> to compare the addresses", - op0, op_symbol_code (code), op1); + "or %<&%s%E%s[0] %s &%s%E%s[0]%> to compare the addresses", + DECL_P (op0) ? "" : "(", op0, DECL_P (op0) ? "" : ")", + op_symbol_code (code), + DECL_P (op1) ? "" : "(", op1, DECL_P (op1) ? "" : ")"); else - inform (location, "use %<&%D[0] %s &%D[0]%> to compare the addresses", - op0, op_symbol_code (code), op1); + inform (location, + "use %<&%s%E%s[0] %s &%s%E%s[0]%> to compare the addresses", + DECL_P (op0) ? "" : "(", op0, DECL_P (op0) ? "" : ")", + op_symbol_code (code), + DECL_P (op1) ? "" : "(", op1, DECL_P (op1) ? "" : ")"); } } --- gcc/testsuite/c-c++-common/Warray-compare-3.c.jj2024-06-17 15:13:57.098422635 +0200 +++ gcc/testsuite/c-c++-common/Warray-compare-3.c 2024-06-17 15:13:24.339849049 +0200 @@ -0,0 +1,13 @@ +/* PR c/115290 */ +/* { dg-do compile } */ +/* { dg-options "-Warray-compare" } */ + +int a[32][32], b[32][32]; + +int +foo (int x, int y) +{ + return (x ? a : b) == (y ? a : b); /* { dg-warning "comparison between two arrays" } */ +/* { dg-message "use '&\\\(\[^\n\r]*\\\)\\\[0\\\] == &\\\(\[^\n\r]*\\\)\\\[0\\\]' to compare the addresses" "" { target c } .-1 } */ +/* { dg-message "use unary '\\\+' which decays operands to pointers or '&\\\(\[^\n\r]*\\\)\\\[0\\\] == &\\\(\[^\n\r]*\\\)\\\[0\\\]' to compare the addresses" "" { target c++ } .-2 } */ +} Jakub
Re: [PATCH 30/52 v2] pdp11: Remove macro {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE
Hi Paul, on 2024/6/14 23:20, Paul Koning wrote: > Ok, I understand better now. But if those macros are supposed to be replaced > by hook functions, could you make that replacement part of the proposed patch? The default implementation of the introduced hook mode_for_floating_type returns SFmode for float and DFmode for double or long double, which matches what pdp11 port requires, so there is no need to add its own hook implementation. This patch series only re-define this hook macro with the customized hook implementation for those ports which need something beyond the default. BR, Kewen > > paul > >> On Jun 13, 2024, at 11:22 PM, Kewen.Lin wrote: >> >> Hi Paul, >> >> on 2024/6/14 04:07, Paul Koning wrote: >>> What is the effect of this change? The original code intended to have >>> "float" mean a 32 bit value, and "double" a 64 bit value. There aren't any >>> larger floats, so I defined the long double size as 64 also. Is the right >>> answer not to define it? >> >> Since sub-patch 09/52 will poison {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE, >> target code building will fail >> if it still has these macros. As I'd like to squash these target changes >> onto 09/52, so I didn't note >> the background/context here, sorry about that. >> >>> >>> That part I understand, but why does the patch also remove FLOAT_TYPE_SIZE >>> and DOUBLE_TYPE_SIZE without explanation and without mention in the >>> changelog? >> >> Oops, thanks for catching! I just noticed this sub-patch has inconsistent >> subject & changelog, I should >> have noticed this as it has a quite different subject from the others. :( >> With your finding, I just >> re-visited all the other sub-patches, luckily they are consistent. >> >> The below is the updated revision, hope it looks good to you. Thanks again. >> >> BR, >> Kewen >> - >> >> Subject: [PATCH] pdp11: Remove macro {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE >> >> This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE >> defines in pdp11 port, as we want to replace these macros >> with hook mode_for_floating_type and poison them. >> >> gcc/ChangeLog: >> >>* config/pdp11/pdp11.h (FLOAT_TYPE_SIZE): Remove. >>(DOUBLE_TYPE_SIZE): Likewise. >>(LONG_DOUBLE_TYPE_SIZE): Likewise. >> --- >> gcc/config/pdp11/pdp11.h | 11 --- >> 1 file changed, 11 deletions(-) >> >> diff --git a/gcc/config/pdp11/pdp11.h b/gcc/config/pdp11/pdp11.h >> index 2446fea0b58..6c8e045bc57 100644 >> --- a/gcc/config/pdp11/pdp11.h >> +++ b/gcc/config/pdp11/pdp11.h >> @@ -71,17 +71,6 @@ along with GCC; see the file COPYING3. If not see >> #define LONG_TYPE_SIZE 32 >> #define LONG_LONG_TYPE_SIZE64 >> >> -/* In earlier versions, FLOAT_TYPE_SIZE was selectable as 32 or 64, >> - but that conflicts with Fortran language rules. Since there is no >> - obvious reason why we should have that feature -- other targets >> - generally don't have float and double the same size -- I've removed >> - it. Note that it continues to be true (for now) that arithmetic is >> - always done with 64-bit values, i.e., the FPU is always in "double" >> - mode. */ >> -#define FLOAT_TYPE_SIZE32 >> -#define DOUBLE_TYPE_SIZE 64 >> -#define LONG_DOUBLE_TYPE_SIZE 64 >> - >> /* machine types from ansi */ >> #define SIZE_TYPE "short unsigned int" /* definition of size_t */ >> #define WCHAR_TYPE "short int" /* or long int */ >> -- >> 2.43.0 >> >> >
Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab
On Mon, Jun 17, 2024 at 08:16:34AM +0200, Richard Biener wrote: > On Mon, 17 Jun 2024, Kewen.Lin wrote: > > > Hi Richi, > > > > on 2024/6/14 18:31, Richard Biener wrote: > > > The following retires vcond{,u,eq} optabs by stopping to use them > > > from the middle-end. Targets instead (should) implement vcond_mask > > > and vec_cmp{,u,eq} optabs. The PR this change refers to lists > > > possibly affected targets - those implementing these patterns, > > > and in particular it lists mips, sparc and ia64 as targets that > > > most definitely will regress while others might simply remove > > > their vcond{,u,eq} patterns. > > > > > > I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64. > > > I know riscv doesn't implement any of the legacy optabs. But less > > > maintained vector targets might need adjustments. > > > > Thanks for making this change, this patch can be bootstrapped on ppc64{,le} > > but both have one failure on gcc/testsuite/gcc.target/powerpc/pr66144-3.c, > > by looking into it, I found it just exposed one oversight in the current > > rs6000 vcond_mask support (the condition mask location is wrong), so I think > > this change is fine for rs6000 port, I'll also test SPEC2017 for this (with > > rs6000 vcond_mask change) soon. > > Btw, for those targets where the patch works out fine it would be nice > to delete their vcond{,u,eq} expanders (and double-check that doesn't > cause issues on its own). > > Can target maintainers note whether their targets support all condition > codes for their vector comparisons (including FP variants)? And > whether they choose to implement all condition codes in vec_cmp > and adjust with inversion / operand swapping for not supported cases? On s390 we support all comparison operations with inverse / operand swapping via s390_expand_vec_compare. However, we still have some failures for which I opened PR115519. Currently it is unclear to me what precisely is missing and will have a further look. vcond_mask expander is also implemented for all modes. Cheers, Stefan > > Thanks, > Richard. > > > BR, > > Kewen > > > > > > > > I want to get rid of those optabs for GCC 15. If I don't hear from > > > you I will assume your target is fine. > > > > > > Thanks, > > > Richard. > > > > > > PR middle-end/114189 > > > * optabs-query.h (get_vcond_icode): Always return CODE_FOR_nothing. > > > (get_vcond_eq_icode): Likewise. > > > --- > > > gcc/optabs-query.h | 13 - > > > 1 file changed, 4 insertions(+), 9 deletions(-) > > > > > > diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h > > > index 0cb2c21ba85..31fbce80175 100644 > > > --- a/gcc/optabs-query.h > > > +++ b/gcc/optabs-query.h > > > @@ -112,14 +112,9 @@ get_vec_cmp_eq_icode (machine_mode vmode, > > > machine_mode mask_mode) > > > mode CMODE, unsigned if UNS is true, resulting in a value of mode > > > VMODE. */ > > > > > > inline enum insn_code > > > -get_vcond_icode (machine_mode vmode, machine_mode cmode, bool uns) > > > +get_vcond_icode (machine_mode, machine_mode, bool) > > > { > > > - enum insn_code icode = CODE_FOR_nothing; > > > - if (uns) > > > -icode = convert_optab_handler (vcondu_optab, vmode, cmode); > > > - else > > > -icode = convert_optab_handler (vcond_optab, vmode, cmode); > > > - return icode; > > > + return CODE_FOR_nothing; > > > } > > > > > > /* Return insn code for a conditional operator with a mask mode > > > @@ -135,9 +130,9 @@ get_vcond_mask_icode (machine_mode vmode, > > > machine_mode mmode) > > > mode CMODE (only EQ/NE), resulting in a value of mode VMODE. */ > > > > > > inline enum insn_code > > > -get_vcond_eq_icode (machine_mode vmode, machine_mode cmode) > > > +get_vcond_eq_icode (machine_mode, machine_mode) > > > { > > > - return convert_optab_handler (vcondeq_optab, vmode, cmode); > > > + return CODE_FOR_nothing; > > > } > > > > > > /* Enumerates the possible extraction_insn operations. */ > > > > > > -- > Richard Biener > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
[PATCH] rs6000: Shrink rs6000_init_generated_builtins size [PR115324]
Hi! While my r15-1001-g4cf2de9b5268224 PCH PIE power fix change decreased the .data section sizes (219792 -> 189336), it increased the size of already huge rs6000_init_generated_builtins generated function, from 218328 to 228668 bytes. That is because there are thousands of array references to global arrays and we keep constructing the addresses of the arrays again and again. Ideally some optimization would figure out we have a single function which has 461 rs6000_overload_info 1257 rs6000_builtin_info_fntype 1768 rs6000_builtin_decls 2548 rs6000_instance_info_fntype array references and that maybe it might be a good idea to just preload the addresses of those arrays into some register if it decreases code size and doesn't slow things down. The function actually is called just once and is huge, so code size is even more important than speed, which is dominated by all the GC allocations anyway. Until that is done, here is a slightly cleaner version of the hack, which makes the function noipa (so that LTO doesn't undo it) for GCC 8.1+ and passes the 4 arrays as arguments to the function from the caller. This decreases the function size from 228668 bytes to 207572 bytes. Bootstrapped/regtested on powerpc64le-linux, ok for trunk? 2024-06-17 Jakub Jelinek PR target/115324 * config/rs6000/rs6000-gen-builtins.cc (write_decls): Change declaration of rs6000_init_generated_builtins from no arguments to 4 pointer arguments. (write_init_bif_table): Change rs6000_builtin_info_fntype to builtin_info_fntype and rs6000_builtin_decls to builtin_decls. (write_init_ovld_table): Change rs6000_instance_info_fntype to instance_info_fntype, rs6000_builtin_decls to builtin_decls and rs6000_overload_info to overload_info. (write_init_file): Add __noipa__ attribute to rs6000_init_generated_builtins for GCC 8.1+ and change the function from no arguments to 4 pointer arguments. Change rs6000_builtin_decls to builtin_decls. * config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Adjust rs6000_init_generated_builtins caller. --- gcc/config/rs6000/rs6000-gen-builtins.cc.jj 2024-06-03 23:11:02.662631144 +0200 +++ gcc/config/rs6000/rs6000-gen-builtins.cc2024-06-03 23:38:31.727620920 +0200 @@ -2376,7 +2376,10 @@ write_decls (void) "rs6000_instance_info_fntype[RS6000_INST_MAX];\n"); fprintf (header_file, "extern ovldrecord rs6000_overload_info[];\n\n"); - fprintf (header_file, "extern void rs6000_init_generated_builtins ();\n\n"); + fprintf (header_file, + "extern void rs6000_init_generated_builtins (tree *, tree *,\n"); + fprintf (header_file, + "\t\t\t\t\tovldrecord *, tree *);\n\n"); fprintf (header_file, "extern bool rs6000_builtin_is_supported (rs6000_gen_builtins);\n"); fprintf (header_file, @@ -2651,7 +2654,7 @@ write_init_bif_table (void) for (int i = 0; i <= curr_bif; i++) { fprintf (init_file, - " rs6000_builtin_info_fntype[RS6000_BIF_%s]" + " builtin_info_fntype[RS6000_BIF_%s]" "\n= %s;\n", bifs[i].idname, bifs[i].fndecl); @@ -2678,7 +2681,7 @@ write_init_bif_table (void) } fprintf (init_file, - " rs6000_builtin_decls[(int)RS6000_BIF_%s] = t\n", + " builtin_decls[(int)RS6000_BIF_%s] = t\n", bifs[i].idname); fprintf (init_file, "= add_builtin_function (\"%s\",\n", @@ -2719,7 +2722,7 @@ write_init_bif_table (void) fprintf (init_file, "}\n"); fprintf (init_file, " else\n"); fprintf (init_file, "{\n"); - fprintf (init_file, " rs6000_builtin_decls" + fprintf (init_file, " builtin_decls" "[(int)RS6000_BIF_%s] = NULL_TREE;\n", bifs[i].idname); fprintf (init_file, "}\n"); } @@ -2740,7 +2743,7 @@ write_init_ovld_table (void) for (int i = 0; i <= curr_ovld; i++) { fprintf (init_file, - " rs6000_instance_info_fntype[RS6000_INST_%s]" + " instance_info_fntype[RS6000_INST_%s]" "\n= %s;\n", ovlds[i].ovld_id_name, ovlds[i].fndecl); @@ -2772,7 +2775,7 @@ write_init_ovld_table (void) } fprintf (init_file, - " rs6000_builtin_decls[(int)RS6000_OVLD_%s] = t\n", + " builtin_decls[(int)RS6000_OVLD_%s] = t\n", stanza->stanza_id); fprintf (init_file, "= add_builtin_function (\"%s\",\n", @@ -2793,7 +2796,7 @@ write_init_ovld_table (void) fprintf (init_file, "\n"); fprintf (init_file, - " rs6000_overload_info[RS6000_OVLD_%s - base]" + " overload_info[RS6000_OVLD_%s - base]" ".first_instance\n",
Re: [PATCH] c-family: Fix -Warray-compare warning ICE [PR115290]
On Mon, Jun 17, 2024 at 07:09:03PM +0200, Jakub Jelinek wrote: > Hi! > > The warning code uses %D to print the ARRAY_REF first operands. > That works in the most common case where those operands are decls, but > as can be seen on the following testcase, they can be other expressions > with array type. > Just changing %D to %E isn't enough, because then the diagnostics can > suggest something like > note: use '&(x) != 0 ? (int (*)[32])&a : (int (*)[32])&b[0] == &(y) != 0 ? > (int (*)[32])&a : (int (*)[32])&b[0]' to compare the addresses > which is a bad suggestion, the %E printing doesn't know that the > warning code will want to add & before it and [0] after it. > So, the following patch adds ()s around the operand as well, but does > that only for non-decls, for decls keeps it as &arr[0] like before. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk > and release branches? Ok, thanks. > 2024-06-17 Jakub Jelinek > > PR c/115290 > * c-warn.cc (do_warn_array_compare): Use %E rather than %D for > printing op0 and op1; if those operands aren't decls, also print > parens around them. > > * c-c++-common/Warray-compare-3.c: New test. > > --- gcc/c-family/c-warn.cc.jj 2024-06-04 13:19:03.371609456 +0200 > +++ gcc/c-family/c-warn.cc2024-06-17 15:07:09.005737065 +0200 > @@ -3832,11 +3832,16 @@ do_warn_array_compare (location_t locati >/* C doesn't allow +arr. */ >if (c_dialect_cxx ()) > inform (location, "use unary %<+%> which decays operands to pointers " > - "or %<&%D[0] %s &%D[0]%> to compare the addresses", > - op0, op_symbol_code (code), op1); > + "or %<&%s%E%s[0] %s &%s%E%s[0]%> to compare the addresses", > + DECL_P (op0) ? "" : "(", op0, DECL_P (op0) ? "" : ")", > + op_symbol_code (code), > + DECL_P (op1) ? "" : "(", op1, DECL_P (op1) ? "" : ")"); >else > - inform (location, "use %<&%D[0] %s &%D[0]%> to compare the addresses", > - op0, op_symbol_code (code), op1); > + inform (location, > + "use %<&%s%E%s[0] %s &%s%E%s[0]%> to compare the addresses", > + DECL_P (op0) ? "" : "(", op0, DECL_P (op0) ? "" : ")", > + op_symbol_code (code), > + DECL_P (op1) ? "" : "(", op1, DECL_P (op1) ? "" : ")"); > } > } > > --- gcc/testsuite/c-c++-common/Warray-compare-3.c.jj 2024-06-17 > 15:13:57.098422635 +0200 > +++ gcc/testsuite/c-c++-common/Warray-compare-3.c 2024-06-17 > 15:13:24.339849049 +0200 > @@ -0,0 +1,13 @@ > +/* PR c/115290 */ > +/* { dg-do compile } */ > +/* { dg-options "-Warray-compare" } */ > + > +int a[32][32], b[32][32]; > + > +int > +foo (int x, int y) > +{ > + return (x ? a : b) == (y ? a : b); /* { dg-warning "comparison between two > arrays" } */ > +/* { dg-message "use '&\\\(\[^\n\r]*\\\)\\\[0\\\] == > &\\\(\[^\n\r]*\\\)\\\[0\\\]' to compare the addresses" "" { target c } .-1 } > */ > +/* { dg-message "use unary '\\\+' which decays operands to pointers or > '&\\\(\[^\n\r]*\\\)\\\[0\\\] == &\\\(\[^\n\r]*\\\)\\\[0\\\]' to compare the > addresses" "" { target c++ } .-2 } */ > +} > > Jakub > Marek
Re: [pushed 2/3] libcpp: move label_text to its own header
Hi, On Thu, Jun 6, 2024 at 7:05 PM Andrew Pinski wrote: > > On Thu, Jun 6, 2024 at 9:00 AM David Malcolm wrote: > > > > On Thu, 2024-06-06 at 08:40 -0700, Andrew Pinski wrote: > > > On Thu, Jun 6, 2024 at 6:02 AM Bert Wesarg > > > wrote: > > > > > > > > Dear David, > > > > > > > > On Tue, May 28, 2024 at 10:07 PM David Malcolm > > > > wrote: > > > > > > > > > > No functional change intended. > > > > > > > > > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. > > > > > Pushed to trunk as r15-874-g9bda2c4c81b668. > > > > > > > > > > libcpp/ChangeLog: > > > > > * Makefile.in (TAGS_SOURCES): Add include/label-text.h. > > > > > * include/label-text.h: New file. > > > > > * include/rich-location.h: Include "label-text.h". > > > > > (class label_text): Move to label-text.h. > > > > > > > > > > Signed-off-by: David Malcolm > > > > > --- > > > > > libcpp/Makefile.in | 2 +- > > > > > libcpp/include/label-text.h| 102 > > > > > + > > > > > libcpp/include/rich-location.h | 79 + > > > > > 3 files changed, 105 insertions(+), 78 deletions(-) > > > > > create mode 100644 libcpp/include/label-text.h > > > > > > > > > > diff --git a/libcpp/Makefile.in b/libcpp/Makefile.in > > > > > index ebbca3fb..7e47153264c0 100644 > > > > > --- a/libcpp/Makefile.in > > > > > +++ b/libcpp/Makefile.in > > > > > @@ -271,7 +271,7 @@ ETAGS = @ETAGS@ > > > > > > > > > > TAGS_SOURCES = $(libcpp_a_SOURCES) internal.h system.h ucnid.h \ > > > > > include/cpplib.h include/line-map.h include/mkdeps.h > > > > > include/symtab.h \ > > > > > -include/rich-location.h > > > > > +include/rich-location.h include/label-text.h > > > > > > > > this does not seem to be enough that the new header will be > > > > installed. > > > > I get compile errors when compiling an plug-in with this patch: > > > > > > > > In file included from > > > > /home/bitten/opt/gcc-15-20240602/lib/gcc/x86_64-pc-linux- > > > > gnu/15.0.0/plugin/include/diagnostic.h:24, > > > > from > > > > /home/bitten/builds/oCyPvWN6/1/perftools/cicd/scorep/src/build-gcc- > > > > plugin/../src/adapters/compiler/gcc- > > > > plugin/scorep_plugin_inst_descriptor.cpp:43: > > > > /home/bitten/opt/gcc-15-20240602/lib/gcc/x86_64-pc-linux- > > > > gnu/15.0.0/plugin/include/rich-location.h:25:10: > > > > fatal error: label-text.h: No such file or directory > > > > 25 | #include "label-text.h" > > > > > ^~ > > > > compilation terminated. > > > > > > I have a fix which I am testing. > > > > Likewise (and sorry about the breakage) > > Committed as r15-1076-g6e6471806d886b . Thanks. I can confirm, that my external plugin builds again. Bert > > > > > Dave > >
Re: [PATCH] rs6000, altivec-2-runnable.c update the require-effective-target
On 6/14/24 1:37 PM, Carl Love wrote: > Per the additional feedback after patch: > > commit c892525813c94b018464d5a4edc17f79186606b7 > Author: Carl Love > Date: Tue Jun 11 14:01:16 2024 -0400 > > rs6000, altivec-2-runnable.c should be a runnable test > > The test case has "dg-do compile" set not "dg-do run" for a runnable > test. This patch changes the dg-do command argument to run. > > gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog: > * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do > argument to run. Test case altivec-1-runnable.c seems to have the same issue, in that it is currently a dg-do compile test case rather than the intended dg-do run. Can you have a look at changing that to dg-do run too? My guess it that this one will want something similar to some other altivec test cases, ala: /* { dg-do run { target vmx_hw } } */ /* { dg-do compile { target { ! vmx_hw } } } */ /* { dg-require-effective-target powerpc_altivec_ok } */ /* { dg-options "-O2 -maltivec -mabi=altivec" } */ That said, I don't like not having a -mdejagnu-cpu=... here. I think for our server cpus, this is fine, but on an embedded system with a old ISA default for -mcpu=... (so we be doing a dg-do compile), just adding -maltivec to that default may not make much sense for that default and probably should be an error. Maybe something like: /* { dg-do run { target vmx_hw } } */ /* { dg-do compile { target { ! vmx_hw } } } */ /* { dg-require-effective-target powerpc_altivec_ok } */ /* { dg-options "-O2 -mdejagnu=power7" } */ ...makes more sense? Ke Wen & Segher, thoughts on that? Ke Wen, should powerpc_altivec_ok be powerpc_altivec here??? Peter
Re: [PATCH 2/3] Enabled LRA for ia64.
On Fri, 14 Jun 2024, Jonathan Wakely wrote: > Both, ideally. The libstdc++ test should definitely be fixed because > it fails with released versions of glibc already in the wild. But > glibc should also be fixed because it's a standards conformance issue. The __ctx macro used in various sys/ucontext.h headers prepends __ in standards conformance modes (the point being to avoid breaking the API outside such modes when we fixed the namespace issues). #ifdef __USE_MISC # define __ctx(fld) fld #else # define __ctx(fld) __ ## fld #endif (bits/sigcontext.h didn't get any such fixes as it's not included at all in standards conformance modes, only if __USE_MISC.) -- Joseph S. Myers josmy...@redhat.com
Re: [PATCH 30/52 v2] pdp11: Remove macro {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE
Thanks Kewen. Given that background, the patch is OK. paul > On Jun 16, 2024, at 10:01 PM, Kewen.Lin wrote: > > Hi Paul, > > on 2024/6/14 23:20, Paul Koning wrote: >> Ok, I understand better now. But if those macros are supposed to be >> replaced by hook functions, could you make that replacement part of the >> proposed patch? > > The default implementation of the introduced hook mode_for_floating_type > returns SFmode for float and DFmode for double or long double, which matches > what pdp11 port requires, so there is no need to add its own hook > implementation. > This patch series only re-define this hook macro with the customized hook > implementation for those ports which need something beyond the default. > > BR, > Kewen > >> >> paul >> >>> On Jun 13, 2024, at 11:22 PM, Kewen.Lin wrote: >>> >>> Hi Paul, >>> >>> on 2024/6/14 04:07, Paul Koning wrote: What is the effect of this change? The original code intended to have "float" mean a 32 bit value, and "double" a 64 bit value. There aren't any larger floats, so I defined the long double size as 64 also. Is the right answer not to define it? >>> >>> Since sub-patch 09/52 will poison {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE, >>> target code building will fail >>> if it still has these macros. As I'd like to squash these target changes >>> onto 09/52, so I didn't note >>> the background/context here, sorry about that. >>> That part I understand, but why does the patch also remove FLOAT_TYPE_SIZE and DOUBLE_TYPE_SIZE without explanation and without mention in the changelog? >>> >>> Oops, thanks for catching! I just noticed this sub-patch has inconsistent >>> subject & changelog, I should >>> have noticed this as it has a quite different subject from the others. :( >>> With your finding, I just >>> re-visited all the other sub-patches, luckily they are consistent. >>> >>> The below is the updated revision, hope it looks good to you. Thanks again. >>> >>> BR, >>> Kewen >>> - >>> >>> Subject: [PATCH] pdp11: Remove macro {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE >>> >>> This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE >>> defines in pdp11 port, as we want to replace these macros >>> with hook mode_for_floating_type and poison them. >>> >>> gcc/ChangeLog: >>> >>> * config/pdp11/pdp11.h (FLOAT_TYPE_SIZE): Remove. >>> (DOUBLE_TYPE_SIZE): Likewise. >>> (LONG_DOUBLE_TYPE_SIZE): Likewise. >>> --- >>> gcc/config/pdp11/pdp11.h | 11 --- >>> 1 file changed, 11 deletions(-) >>> >>> diff --git a/gcc/config/pdp11/pdp11.h b/gcc/config/pdp11/pdp11.h >>> index 2446fea0b58..6c8e045bc57 100644 >>> --- a/gcc/config/pdp11/pdp11.h >>> +++ b/gcc/config/pdp11/pdp11.h >>> @@ -71,17 +71,6 @@ along with GCC; see the file COPYING3. If not see >>> #define LONG_TYPE_SIZE 32 >>> #define LONG_LONG_TYPE_SIZE64 >>> >>> -/* In earlier versions, FLOAT_TYPE_SIZE was selectable as 32 or 64, >>> - but that conflicts with Fortran language rules. Since there is no >>> - obvious reason why we should have that feature -- other targets >>> - generally don't have float and double the same size -- I've removed >>> - it. Note that it continues to be true (for now) that arithmetic is >>> - always done with 64-bit values, i.e., the FPU is always in "double" >>> - mode. */ >>> -#define FLOAT_TYPE_SIZE32 >>> -#define DOUBLE_TYPE_SIZE 64 >>> -#define LONG_DOUBLE_TYPE_SIZE 64 >>> - >>> /* machine types from ansi */ >>> #define SIZE_TYPE "short unsigned int" /* definition of size_t */ >>> #define WCHAR_TYPE "short int" /* or long int */ >>> -- >>> 2.43.0 >>> >>> >> >
[PATCH] c++: ICE with generic lambda and pack expansion [PR115425]
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? -- >8 -- In r13-272 we hardened the *_PACK_EXPANSION and *_ARGUMENT_PACK macros. That trips up here because make_pack_expansion returns error_mark_node and we access that with PACK_EXPANSION_LOCAL_P. PR c++/115425 gcc/cp/ChangeLog: * pt.cc (tsubst_pack_expansion): Return error_mark_node if make_pack_expansion doesn't work out. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/lambda-generic12.C: New test. --- gcc/cp/pt.cc | 2 ++ gcc/testsuite/g++.dg/cpp2a/lambda-generic12.C | 25 +++ 2 files changed, 27 insertions(+) create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-generic12.C diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc index 607753ae6b7..e676372f75b 100644 --- a/gcc/cp/pt.cc +++ b/gcc/cp/pt.cc @@ -13775,6 +13775,8 @@ tsubst_pack_expansion (tree t, tree args, tsubst_flags_t complain, else result = tsubst (pattern, args, complain, in_decl); result = make_pack_expansion (result, complain); + if (result == error_mark_node) + return error_mark_node; PACK_EXPANSION_LOCAL_P (result) = PACK_EXPANSION_LOCAL_P (t); PACK_EXPANSION_SIZEOF_P (result) = PACK_EXPANSION_SIZEOF_P (t); if (PACK_EXPANSION_AUTO_P (t)) diff --git a/gcc/testsuite/g++.dg/cpp2a/lambda-generic12.C b/gcc/testsuite/g++.dg/cpp2a/lambda-generic12.C new file mode 100644 index 000..219529c7c32 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/lambda-generic12.C @@ -0,0 +1,25 @@ +// PR c++/115425 +// { dg-do compile { target c++20 } } + +using size_t = decltype(sizeof(0)); + +template +struct X {}; + +template +void foo(X); + +template +struct S; + +template +auto test() { + constexpr static auto x = foo>(); // { dg-error "no matching function" } + return [](X) { +(typename S::type{}, ...); + }(X<__integer_pack (0)...>{}); +} + +int main() { + test(); +} base-commit: b63c7d92012f92e0517190cf263d29bbef8a06bf -- 2.45.1
[PATCH V3 0/2] Fix ICE with vwsll combine on 32bit targets
The following testcases have been failing on rv32 targets since r15-953-gaf4bf422a69: FAIL: gcc.target/riscv/rvv/autovec/binop/vwsll-1.c (internal compiler error: in maybe_legitimize_operand, at optabs.cc:8056) FAIL: gcc.target/riscv/rvv/autovec/binop/vwsll-1.c (test for excess errors) Fix the bug and also robustify our emit_insn by making an assertion check unconditional I'm not sure if this ICE warrants its own separate testcase since it is already being tested. I do have a minimal testcase on hand if we would like to add one. V2: Remove subreg condition and change assert to internal error V3: Update the _trunc_scalar splitter as well Edwin Lu (2): RISC-V: Fix vwsll combine on rv32 targets RISC-V: Move mode assertion out of conditional branch in emit_insn gcc/config/riscv/autovec-opt.md | 6 ++ gcc/config/riscv/riscv-v.cc | 25 +++-- 2 files changed, 21 insertions(+), 10 deletions(-) -- 2.34.1
[PATCH V3 1/2] RISC-V: Fix vwsll combine on rv32 targets
On rv32 targets, vwsll_zext1_scalar_ would trigger an ice in maybe_legitimize_instruction when zero extending a uint32 to uint64 due to a mismatch between the input operand's mode (DI) and the expanded insn operand's mode (Pmode == SI). Ensure that mode of the operands match Tested on rv32/64 gcv newlib. Letting CI perform additional testing gcc/ChangeLog: * config/riscv/autovec-opt.md: Fix mode mismatch Signed-off-by: Edwin Lu Co-authored-by: Robin Dapp --- V2: Remove subreg check V3: Update _trunc_scalar splitter as well --- gcc/config/riscv/autovec-opt.md | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md index 6a2eabbd854..d7a3cfd4602 100644 --- a/gcc/config/riscv/autovec-opt.md +++ b/gcc/config/riscv/autovec-opt.md @@ -1517,8 +1517,7 @@ (define_insn_and_split "*vwsll_zext1_scalar_" "&& 1" [(const_int 0)] { -if (GET_CODE (operands[2]) == SUBREG) - operands[2] = SUBREG_REG (operands[2]); +operands[2] = gen_lowpart (Pmode, operands[2]); insn_code icode = code_for_pred_vwsll_scalar (mode); riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands); DONE; @@ -1584,8 +1583,7 @@ (define_insn_and_split "*vwsll_zext1_trunc_scalar_" "&& 1" [(const_int 0)] { -if (GET_CODE (operands[2]) == SUBREG) - operands[2] = SUBREG_REG (operands[2]); +operands[2] = gen_lowpart (Pmode, operands[2]); insn_code icode = code_for_pred_vwsll_scalar (mode); riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands); DONE; -- 2.34.1
[PATCH V3 2/2] RISC-V: Move mode assertion out of conditional branch in emit_insn
When emitting insns, we have an early assertion to ensure the input operand's mode and the expanded operand's mode are the same; however, it does not perform this check if the pattern does not have an explicit machine mode specifying the operand. In this scenario, it will always assume that mode = Pmode to correctly satisfy the maybe_legitimize_operand check, however, there may be problems when working in 32 bit environments. Make the assert unconditional and replace it with an internal error for more descriptive logging gcc/ChangeLog: * config/riscv/riscv-v.cc: Move assert out of conditional block Signed-off-by: Edwin Lu Co-authored-by: Robin Dapp --- V2: change assert to internal error V3: No change --- gcc/config/riscv/riscv-v.cc | 25 +++-- 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 8911f5783c8..5306711c1b7 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -50,6 +50,7 @@ #include "rtx-vector-builder.h" #include "targhooks.h" #include "predict.h" +#include "errors.h" using namespace riscv_vector; @@ -290,11 +291,17 @@ public: always Pmode. */ if (mode == VOIDmode) mode = Pmode; - else - /* Early assertion ensures same mode since maybe_legitimize_operand -will check this. */ - gcc_assert (GET_MODE (ops[opno]) == VOIDmode - || GET_MODE (ops[opno]) == mode); + + /* Early assertion ensures same mode since maybe_legitimize_operand + will check this. */ + machine_mode required_mode = GET_MODE (ops[opno]); + if (required_mode != VOIDmode && required_mode != mode) + internal_error ("expected mode %s for operand %d of " + "insn %s but got mode %s.\n", + GET_MODE_NAME (mode), + opno, + insn_data[(int) icode].name, + GET_MODE_NAME (required_mode)); add_input_operand (ops[opno], mode); } @@ -346,7 +353,13 @@ public: else if (m_insn_flags & VXRM_RDN_P) add_rounding_mode_operand (VXRM_RDN); -gcc_assert (insn_data[(int) icode].n_operands == m_opno); + +if (insn_data[(int) icode].n_operands != m_opno) + internal_error ("invalid number of operands for insn %s, " + "expected %d but got %d.\n", + insn_data[(int) icode].name, + insn_data[(int) icode].n_operands, m_opno); + expand (icode, any_mem_p); } -- 2.34.1
Re: [PATCH 2/3] Enabled LRA for ia64.
On Mon, 17 Jun 2024 at 19:03, Joseph Myers wrote: > > On Fri, 14 Jun 2024, Jonathan Wakely wrote: > > > Both, ideally. The libstdc++ test should definitely be fixed because > > it fails with released versions of glibc already in the wild. But > > glibc should also be fixed because it's a standards conformance issue. > > The __ctx macro used in various sys/ucontext.h headers prepends __ in > standards conformance modes (the point being to avoid breaking the API > outside such modes when we fixed the namespace issues). > > #ifdef __USE_MISC > # define __ctx(fld) fld > #else > # define __ctx(fld) __ ## fld > #endif > > (bits/sigcontext.h didn't get any such fixes as it's not included at all > in standards conformance modes, only if __USE_MISC.) I see, thanks. So it's not a problem in C, only in C++ due to G++ defining _GNU_SOURCE. Let's just change the libstdc++ tests then.
Re: [C PATCH, v3] Fix for redeclared enumerator initialized with different type [PR115109]
On Sat, 15 Jun 2024, Martin Uecker wrote: > The patch fails on arm because the tests make assumptions > about enums that are not true everywhere. Should we just > limit the tests to x86? For compilation tests, using -fno-short-enums should work. That won't work for link / execute tests, but in those cases you can use { target { ! short_enums } }. (If there are other issues beyond a short-enums default, other effective-targets may be needed.) -- Joseph S. Myers josmy...@redhat.com
Re: [PATCH] rs6000: Compute rop_hash_save_offset for non-Altivec compiles [PR115389]
On 6/16/24 9:10 PM, Kewen.Lin wrote: > on 2024/6/15 01:05, Peter Bergner wrote: >> That said, the --with-cpu=power5 build without fortran did bootstrap and >> regtest with no regressions, so the build did test that code path and >> exposed no problems. > > OK, nice! Thanks! I assume this means you're "OK" with the updated patch, correct? >> Currently, TARGET_ALTIVEC_ABI is defined as: >> >> #define TARGET_ALTIVEC_ABI rs6000_altivec_abi >> >> Would it make sense to redine it to: >> >> #define TARGET_ALTIVEC_ABI (TARGET_ALTIVEC && rs6000_altivec_abi) >> >> ...or add some code in rs6000 option handling to disable rs6000_altivec_abi >> when TARGET_ALTIVEC is false? or do we care enough to even change it? >> :-) > > Assuming the current code is robust enough (perfectly guarded by some altivec > related > condition like this altivec register saving slot), there may not any actual > errors, > but considering not surprising people, I'm inclined to add some option > handlings for > it, like unsetting rs6000_altivec_abi if !TARGET_ALTIVEC and give some > warning if it's > explicitly specified, what do you think? I like it, since if Altivec is disabled, having TARGET_ALTIVEC_ABI enabled makes no sense to me. That is orthogonal to this bug though, so should be a separate patch. Do you want to take a stab at writing that or do you want me to do that? Peter
Re: [committed] testsuite: Add -Wno-psabi to vshuf-mem.C test
On Mon, Jun 17, 2024 at 09:09:37PM +0200, Andreas Krebbel wrote: > On 6/14/24 20:03, Jakub Jelinek wrote: > > Also wonder about the > > // { dg-additional-options "-march=z14" { target s390*-*-* } } > > line, doesn't that mean the test will FAIL on all pre-z14 HW? > > Shouldn't it use some z14_runtime or similar effective target, or > > check in main (in that case copied over to g++.target/s390) whether > > z14 instructions can be actually used at runtime? > > Oh right. I'll remove that line and replicate the testcase in the arch > specific test dir. Though, looking around some more, perhaps // { dg-additional-options "-march=z14" { target s390_vxe } } might be all that is needed, even in current dir. Jakub
Re: [PATCH 2/3] Enabled LRA for ia64.
On 17.06.24 20:53, Jonathan Wakely wrote: On Mon, 17 Jun 2024 at 19:03, Joseph Myers wrote: On Fri, 14 Jun 2024, Jonathan Wakely wrote: Both, ideally. The libstdc++ test should definitely be fixed because it fails with released versions of glibc already in the wild. But glibc should also be fixed because it's a standards conformance issue. The __ctx macro used in various sys/ucontext.h headers prepends __ in standards conformance modes (the point being to avoid breaking the API outside such modes when we fixed the namespace issues). #ifdef __USE_MISC # define __ctx(fld) fld #else # define __ctx(fld) __ ## fld #endif (bits/sigcontext.h didn't get any such fixes as it's not included at all in standards conformance modes, only if __USE_MISC.) I see, thanks. So it's not a problem in C, only in C++ due to G++ defining _GNU_SOURCE. Let's just change the libstdc++ tests then. Great, I did test that patched in the same way as in [1] on Friday. It makes the three failing tests pass: ``` # make check-target-libstdc++-v3 RUNTESTFLAGS="conformance.exp=17_intro/names*\ experimental/names.cc" Test run by root on Fri Jun 14 16:04:26 2024 Native configuration is ia64-t2-linux-gnu === libstdc++ tests === Schedule of variations: unix Running target unix Running /dev/shm/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp ... PASS: 17_intro/names.cc -std=gnu++17 (test for excess errors) PASS: 17_intro/names_pstl.cc -std=gnu++17 (test for excess errors) PASS: experimental/names.cc -std=gnu++17 (test for excess errors) === libstdc++ Summary === # of expected passes3 ``` [1]: https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=cf5f7791056b3ed993bc8024be767a86157514a9 You most likely want the workaround as separate patch on this list, as the failures were happening for both the non-LRA and LRA case, right? Cheers, Frank
Re: [Patch, Fortran, 96418] Fix Test coarray_alloc_comp_4.f08 ICEs
Hi Andre, Am 17.06.24 um 09:51 schrieb Andre Vehreschild: Regarding your question on the coarray-tests that are not in the coarray-directory: These test in most cases test only one method of implementing coarrays. I.e., they are either testing just -fcoarray=single or -fcoarray=lib -lcaf_single, which are two different approaches. The tests in the coarray-directory test all available methods to implement coarrays. Pushing ah, that explains it. I only looked at some of the test sources, but did not think of looking at caf.exp ... all coarray-tests into the coarray-directory will fail a lot of them, because the behavior of -fcoarray=single and -fcoarray=lib -lcaf_single is different in some corner cases. That's why the coarray-tests in the main gfortran-dir are separate. I do understand why it may be confusing, but I don't see an easy solution. Does this answer your question? Indeed it does! Thanks, Harald
[committed] c: Implement C2Y alignof on incomplete arrays
C2Y has adopted support for alignof applied to incomplete array types (N3273). Add this support to GCC. As the relevant checks are in c-family code that doesn't have access to functions such as pedwarn_c23, this remains a hard error for older versions and isn't handled by -Wc23-c2y-compat, although preferably it would work like pedwarn_c23 (pedwarn-if-pedantic for older versions, warning with -Wc23-c2y-compat in C2Y mode). Bootstrapped with no regressions for x86_64-pc-linux-gnu. gcc/c-family/ * c-common.cc (c_sizeof_or_alignof_type): Allow alignof on an incomplete array type for C2Y. gcc/testsuite/ * gcc.dg/c23-align-10.c, gcc.dg/c2y-align-1.c, gcc.dg/c2y-align-2.c: New tests. diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc index 24335deeb58..7d752acd430 100644 --- a/gcc/c-family/c-common.cc +++ b/gcc/c-family/c-common.cc @@ -3972,7 +3972,9 @@ c_sizeof_or_alignof_type (location_t loc, value = size_one_node; } else if (!COMPLETE_TYPE_P (type) - && (!c_dialect_cxx () || is_sizeof || type_code != ARRAY_TYPE)) + && ((!c_dialect_cxx () && !flag_isoc2y) + || is_sizeof + || type_code != ARRAY_TYPE)) { if (complain) error_at (loc, "invalid application of %qs to incomplete type %qT", diff --git a/gcc/testsuite/gcc.dg/c23-align-10.c b/gcc/testsuite/gcc.dg/c23-align-10.c new file mode 100644 index 000..bd6b9c268c3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/c23-align-10.c @@ -0,0 +1,6 @@ +/* Test C2Y alignof on an incomplete array type: not allowed in C23. */ +/* { dg-do compile } */ +/* { dg-options "-std=c23 -pedantic-errors" } */ + +int a = alignof(int[]); /* { dg-error "incomplete" } */ +int b = alignof(int[][1]); /* { dg-error "incomplete" } */ diff --git a/gcc/testsuite/gcc.dg/c2y-align-1.c b/gcc/testsuite/gcc.dg/c2y-align-1.c new file mode 100644 index 000..3f9ab18c518 --- /dev/null +++ b/gcc/testsuite/gcc.dg/c2y-align-1.c @@ -0,0 +1,6 @@ +/* Test C2Y alignof on an incomplete array type. */ +/* { dg-do compile } */ +/* { dg-options "-std=c2y -pedantic-errors" } */ + +int a = alignof(int[]); +int b = alignof(int[][1]); diff --git a/gcc/testsuite/gcc.dg/c2y-align-2.c b/gcc/testsuite/gcc.dg/c2y-align-2.c new file mode 100644 index 000..b7b87150413 --- /dev/null +++ b/gcc/testsuite/gcc.dg/c2y-align-2.c @@ -0,0 +1,8 @@ +/* Test C2Y alignof on an incomplete array type: still not allowed for other + incomplete types. */ +/* { dg-do compile } */ +/* { dg-options "-std=c2y -pedantic-errors" } */ + +int a = alignof(void); /* { dg-error "void" } */ +struct s; +int b = alignof(struct s); /* { dg-error "incomplete" } */ -- Joseph S. Myers josmy...@redhat.com
Re: [PATCH] diagnostics: Fix add_misspelling_candidates [PR115440]
On Mon, 17 Jun 2024, Jakub Jelinek wrote: > 2024-06-17 Jakub Jelinek > > PR driver/115440 > * opts-common.cc (add_misspelling_candidates): If opt1 is non-NULL, > add a space and opt1 to the alternative suggestion text. > > * g++.dg/cpp1z/pr115440.C: New test. OK. -- Joseph S. Myers josmy...@redhat.com
Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab
Hi Richi, on 2024/6/14 18:31, Richard Biener wrote: > The following retires vcond{,u,eq} optabs by stopping to use them > from the middle-end. Targets instead (should) implement vcond_mask > and vec_cmp{,u,eq} optabs. The PR this change refers to lists > possibly affected targets - those implementing these patterns, > and in particular it lists mips, sparc and ia64 as targets that > most definitely will regress while others might simply remove > their vcond{,u,eq} patterns. > > I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64. > I know riscv doesn't implement any of the legacy optabs. But less > maintained vector targets might need adjustments. Thanks for making this change, this patch can be bootstrapped on ppc64{,le} but both have one failure on gcc/testsuite/gcc.target/powerpc/pr66144-3.c, by looking into it, I found it just exposed one oversight in the current rs6000 vcond_mask support (the condition mask location is wrong), so I think this change is fine for rs6000 port, I'll also test SPEC2017 for this (with rs6000 vcond_mask change) soon. BR, Kewen > > I want to get rid of those optabs for GCC 15. If I don't hear from > you I will assume your target is fine. > > Thanks, > Richard. > > PR middle-end/114189 > * optabs-query.h (get_vcond_icode): Always return CODE_FOR_nothing. > (get_vcond_eq_icode): Likewise. > --- > gcc/optabs-query.h | 13 - > 1 file changed, 4 insertions(+), 9 deletions(-) > > diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h > index 0cb2c21ba85..31fbce80175 100644 > --- a/gcc/optabs-query.h > +++ b/gcc/optabs-query.h > @@ -112,14 +112,9 @@ get_vec_cmp_eq_icode (machine_mode vmode, machine_mode > mask_mode) > mode CMODE, unsigned if UNS is true, resulting in a value of mode VMODE. > */ > > inline enum insn_code > -get_vcond_icode (machine_mode vmode, machine_mode cmode, bool uns) > +get_vcond_icode (machine_mode, machine_mode, bool) > { > - enum insn_code icode = CODE_FOR_nothing; > - if (uns) > -icode = convert_optab_handler (vcondu_optab, vmode, cmode); > - else > -icode = convert_optab_handler (vcond_optab, vmode, cmode); > - return icode; > + return CODE_FOR_nothing; > } > > /* Return insn code for a conditional operator with a mask mode > @@ -135,9 +130,9 @@ get_vcond_mask_icode (machine_mode vmode, machine_mode > mmode) > mode CMODE (only EQ/NE), resulting in a value of mode VMODE. */ > > inline enum insn_code > -get_vcond_eq_icode (machine_mode vmode, machine_mode cmode) > +get_vcond_eq_icode (machine_mode, machine_mode) > { > - return convert_optab_handler (vcondeq_optab, vmode, cmode); > + return CODE_FOR_nothing; > } > > /* Enumerates the possible extraction_insn operations. */
Re: [PATCH] xtensa: constantsynth: Reforge to fix some non-fatal issues
Hi Suwa-san, On Mon, Jun 17, 2024 at 04:17:15PM +0900, Takayuki 'January June' Suwa wrote: > The previous constant synthesis logic had some issues that were non-fatal > but worth considering: > > - It didn't work with DFmode literals, because those were cast to SImode > rather SFmode when splitting into two natural-width words by > split_double(). > > - It didn't work with large literals when TARGET_AUTO_LITPOOLS was enabled, > because those were relaxed MOVI immediates rather references to literal > pool entries, > > - It didn't take into account that when literals with the same RTL > representation are pooled multiple times within a function, those entries > are shared (especially important when optimizing for size). > > This patch addresses the above issues by making appropriate tweaks to the > constant synthesis logic. > > gcc/ChangeLog: > > * config/xtensa/xtensa-protos.h (xtensa_constantsynth): > Change the second argument from HOST_WIDE_INT to rtx. > * config/xtensa/xtensa.cc (#include): > Add "context.h" and "pass_manager.h". > (machine_function): Add a new hash_map field "litpool_usage". > (xtensa_constantsynth): Make "src" (the second operand) accept > RTX literal instead of its value, and treat both bare and pooled > SI/SFmode literals equally by bit-exact canonicalization into > CONST_INT RTX internally. And then, make avoid synthesis if > such multiple identical canonicalized literals are found in same > function when optimizing for size. Finally, for literals where > synthesis is not possible or has been avoided, re-emit "move" > RTXes with canonicalized ones to increase the chances of sharing > literal pool entries. > * config/xtensa/xtensa.md (split patterns for constant synthesis): > Change to simply invoke xtensa_constantsynth() as mentioned above, > and add new patterns for when TARGET_AUTO_LITPOOLS is enabled. > --- > gcc/config/xtensa/xtensa-protos.h | 2 +- > gcc/config/xtensa/xtensa.cc | 75 --- > gcc/config/xtensa/xtensa.md | 56 ++- > 3 files changed, 103 insertions(+), 30 deletions(-) This series introduced a few ICE regressions: +FAIL: gcc.dg/atomic/c11-atomic-exec-2.c -Os (internal compiler error: Segmentation fault) +FAIL: gcc.dg/atomic/c11-atomic-exec-3.c -Os (internal compiler error: Segmentation fault) +FAIL: gcc.dg/atomic/c11-atomic-exec-4.c -Os (internal compiler error: Segmentation fault) +FAIL: gcc.dg/torture/vec-cvt-1.c -Os (internal compiler error: Segmentation fault) +FAIL: c-c++-common/torture/complex-sign-mixed-add.c -Os (internal compiler error: Segmentation fault) +FAIL: c-c++-common/torture/complex-sign-mixed-div.c -Os (internal compiler error: Segmentation fault) +FAIL: c-c++-common/torture/complex-sign-mixed-sub.c -Os (internal compiler error: Segmentation fault) +FAIL: gfortran.dg/bind-c-contiguous-1.f90 -Os (internal compiler error: Segmentation fault) +FAIL: gfortran.dg/bind-c-contiguous-4.f90 -Os (internal compiler error: Segmentation fault) +FAIL: gfortran.dg/minlocval_4.f90 -Os (internal compiler error: Segmentation fault) they all have a backtrace like this: /home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-4.c: In function 'test_main_long_double_postinc': /home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-4.c:73:1: internal compiler error: Segmentation fault /home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-4.c:97:1: note: in expansion of macro 'TEST_FUNCS' 0xf0493f crash_signal /home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/toplev.cc:319 0x7fcc65b98d5f ??? ./signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0 0x98cd63 lookup_page_table_entry /home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/ggc-page.cc:630 0x98cd63 ggc_set_mark(void const*) /home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/ggc-page.cc:1553 0x12b31bd gt_ggc_mx_hash_map_rtx_int_(void*) ./gt-xtensa.h:39 0xc19207 gt_ggc_mx_function(void*) /home/jcmvbkbc/ws/tensilica/gcc/builds/gcc-15-1382-g448482d3d5c2-xtensa-call0-le/gcc/gtype-desc.cc:1696 0xc19207 gt_ggc_mx_function(void*) /home/jcmvbkbc/ws/tensilica/gcc/builds/gcc-15-1382-g448482d3d5c2-xtensa-call0-le/gcc/gtype-desc.cc:1680
[c-family] Add minimal support for __bf16 to -fdump-ada-spec
Tested on x86-64/Linux, applied on the mainline. 2024-06-17 Eric Botcazou c-family/ * c-ada-spec.cc (is_float16): New predicate. (dump_ada_node) : Call it. -- Eric Botcazoudiff --git a/gcc/c-family/c-ada-spec.cc b/gcc/c-family/c-ada-spec.cc index a41e93aeafb..e1b1b2a4b73 100644 --- a/gcc/c-family/c-ada-spec.cc +++ b/gcc/c-family/c-ada-spec.cc @@ -2077,6 +2077,22 @@ dump_ada_enum_type (pretty_printer *pp, tree node, tree type, int spc) } } +/* Return true if NODE is the __bf16 type. */ + +static bool +is_float16 (tree node) +{ + if (!TYPE_NAME (node) || TREE_CODE (TYPE_NAME (node)) != TYPE_DECL) +return false; + + tree name = DECL_NAME (TYPE_NAME (node)); + + if (IDENTIFIER_POINTER (name) [0] != '_') +return false; + + return id_equal (name, "__bf16"); +} + /* Return true if NODE is the _Float32/_Float32x type. */ static bool @@ -2210,7 +2226,12 @@ dump_ada_node (pretty_printer *pp, tree node, tree type, int spc, break; case REAL_TYPE: - if (is_float32 (node)) + if (is_float16 (node)) + { + pp_string (pp, "Short_Float"); + break; + } + else if (is_float32 (node)) { pp_string (pp, "Float"); break;
Re: [c-family] Add minimal support for __bf16 to -fdump-ada-spec
On Mon, Jun 17, 2024 at 2:29 PM Eric Botcazou wrote: > > Tested on x86-64/Linux, applied on the mainline. > > > 2024-06-17 Eric Botcazou > > c-family/ > * c-ada-spec.cc (is_float16): New predicate. > (dump_ada_node) : Call it. Hmm, is_float16 seems to be me would be _Float16 rather than __bf16. Those two are two different formats; both could be supported on a target (both aarch64 and x86_64 support both at the same time). Also for __bf16, I think comparing against the format being arm_bfloat_half_format would be a better choice rather than depending on the name. Thanks, Andrew Pinski > > -- > Eric Botcazou
[C PATCH, v4] Fix for redeclared enumerator initialized with different type [PR115109]
This is a new version of the patch. This adds the -fno-short-enums flag to the tests. I will commit it if the CI for am does not claim this time. Bootstrapped and regression tested on x86_64. c23: Fix for redeclared enumerator initialized with different type [PR115109] c23 specifies that the type of a redeclared enumerator is the one of the previous declaration. Convert initializers with different type accordingly and emit an error when the value does not fit. 2024-06-01 Martin Uecker PR c/115109 gcc/c/ * c-decl.cc (build_enumerator): When redeclaring an enumerator convert value to previous type. For redeclared enumerators use underlying type for computing the next value. gcc/testsuite/ * gcc.dg/pr115109.c: New test. * gcc.dg/c23-tag-enum-6.c: New test. * gcc.dg/c23-tag-enum-7.c: New test. commit c8a0ec5150299689e6e36b0044ea811b82d90b2f Author: Martin Uecker Date: Sat May 18 22:00:04 2024 +0200 c23: Fix for redeclared enumerator initialized with different type [PR115109] c23 specifies that the type of a redeclared enumerator is the one of the previous declaration. Convert initializers with different type accordingly and emit an error when the value does not fit. 2024-06-01 Martin Uecker PR c/115109 gcc/c/ * c-decl.cc (build_enumerator): When redeclaring an enumerator convert value to previous type. For redeclared enumerators use underlying type for computing the next value. gcc/testsuite/ * gcc.dg/pr115109.c: New test. * gcc.dg/c23-tag-enum-6.c: New test. * gcc.dg/c23-tag-enum-7.c: New test. diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc index 6c09eb73128..01326570e2b 100644 --- a/gcc/c/c-decl.cc +++ b/gcc/c/c-decl.cc @@ -10277,6 +10277,7 @@ build_enumerator (location_t decl_loc, location_t loc, struct c_enum_contents *the_enum, tree name, tree value) { tree decl; + tree old_decl; /* Validate and default VALUE. */ @@ -10336,6 +10337,23 @@ build_enumerator (location_t decl_loc, location_t loc, definition. */ value = convert (the_enum->enum_type, value); } + else if (flag_isoc23 + && (old_decl = lookup_name_in_scope (name, current_scope)) + && old_decl != error_mark_node + && TREE_TYPE (old_decl) + && TREE_TYPE (TREE_TYPE (old_decl)) + && TREE_CODE (old_decl) == CONST_DECL) +{ + /* Enumeration constants in a redeclaration have the previous type. */ + tree previous_type = TREE_TYPE (DECL_INITIAL (old_decl)); + if (!int_fits_type_p (value, previous_type)) + { + error_at (loc, "value of redeclared enumerator outside the range " +"of %qT", previous_type); + locate_old_decl (old_decl); + } + value = convert (previous_type, value); +} else { /* Even though the underlying type of an enum is unspecified, the @@ -10402,9 +10420,14 @@ build_enumerator (location_t decl_loc, location_t loc, false); } else -the_enum->enum_next_value - = build_binary_op (EXPR_LOC_OR_LOC (value, input_location), -PLUS_EXPR, value, integer_one_node, false); +{ + /* In a redeclaration the type can already be the enumeral type. */ + if (TREE_CODE (TREE_TYPE (value)) == ENUMERAL_TYPE) + value = convert (ENUM_UNDERLYING_TYPE (TREE_TYPE (value)), value); + the_enum->enum_next_value + = build_binary_op (EXPR_LOC_OR_LOC (value, input_location), + PLUS_EXPR, value, integer_one_node, false); +} the_enum->enum_overflow = tree_int_cst_lt (the_enum->enum_next_value, value); if (the_enum->enum_overflow && !ENUM_FIXED_UNDERLYING_TYPE_P (the_enum->enum_type)) diff --git a/gcc/testsuite/gcc.dg/c23-tag-enum-6.c b/gcc/testsuite/gcc.dg/c23-tag-enum-6.c new file mode 100644 index 000..29aef7ee3fd --- /dev/null +++ b/gcc/testsuite/gcc.dg/c23-tag-enum-6.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-std=c23 -fno-short-enums" } */ + +#include + +enum E : int { a = 1, b = 2 }; +enum E : int { b = _Generic(a, enum E: 2), a = 1 }; + +enum H { x = 1 }; +enum H { x = 2UL + UINT_MAX }; /* { dg-error "outside the range" } */ + +enum K : int { z = 1 }; +enum K : int { z = 2UL + UINT_MAX }; /* { dg-error "outside the range" } */ + +enum F { A = 0, B = UINT_MAX }; +enum F { B = UINT_MAX, A };/* { dg-error "outside the range" } */ + +enum G : unsigned int { C = 0, D = UINT_MAX }; +enum G : unsigned int { D = UINT_MAX, C }; /* { dg-error "overflow" } */ + diff --git a/gcc/testsuite/gcc.dg/c23-tag-enum-7.c b/gcc/testsuite/gcc.dg/c23-tag-enum-7.c new file mode 100644 index 000..d4c787c8f71 --- /dev/n
Re: [wwwdocs,pushed] backends.html - Update weblinks to AVR simulators
On Sat, 15 Jun 2024, Georg-Johann Lay wrote: > Applied this one: Cool. > +SimulAVR at https://www.nongnu.org/simulavr"; This one gives a http response of "301 Moved Permanently" redirecting to https://www.nongnu.org/simulavr/ . I'll fix this in a minute. On a related note, though, can we update the references to the simulators from (exemplary) +avrtest at + https://github.com/sprintersb/atest"; +>https://github.com/sprintersb/atest to +https://github.com/sprintersb/atest";>avrtest Thanks, Gerald
[pushed] wwwdocs: backends: Adjust SimulAVR link
The original link gives a "301 Moved Permanently", easily fixed by appending a slash. Pushed. Gerald --- htdocs/backends.html | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/htdocs/backends.html b/htdocs/backends.html index 1f7c85d7..d86783a6 100644 --- a/htdocs/backends.html +++ b/htdocs/backends.html @@ -128,8 +128,8 @@ xtensa | C https://github.com/sprintersb/atest?tab=readme-ov-file#running-the-avr-gcc-testsuite-using-the-avrtest-simulator"; >README: Running the avr-gcc Testsuite using the avrtest Simulator -SimulAVR at https://www.nongnu.org/simulavr"; - >https://www.nongnu.org/simulavr +SimulAVR at https://www.nongnu.org/simulavr/"; + >https://www.nongnu.org/simulavr/ -- 2.45.2
Re: [PATCH] rs6000: Shrink rs6000_init_generated_builtins size [PR115324]
Hi! Thanks for posting this again. Much easier to find that way :-) On Mon, Jun 17, 2024 at 07:15:48PM +0200, Jakub Jelinek wrote: > While my r15-1001-g4cf2de9b5268224 PCH PIE power fix change decreased the > .data section sizes (219792 -> 189336), it increased the size of already > huge rs6000_init_generated_builtins generated function, from 218328 > to 228668 bytes. That is because there are thousands of array references > to global arrays and we keep constructing the addresses of the arrays > again and again. Less than 5%, for some perspective ;-) > Ideally some optimization would figure out we have a single function which > has > 461 rs6000_overload_info >1257 rs6000_builtin_info_fntype >1768 rs6000_builtin_decls >2548 rs6000_instance_info_fntype > array references and that maybe it might be a good idea to just preload > the addresses of those arrays into some register if it decreases code size > and doesn't slow things down. > The function actually is called just once and is huge, so code size is even > more important than speed, which is dominated by all the GC allocations > anyway. Yup. > Until that is done, here is a slightly cleaner version of the hack, which > makes the function noipa (so that LTO doesn't undo it) for GCC 8.1+ and > passes the 4 arrays as arguments to the function from the caller. > This decreases the function size from 228668 bytes to 207572 bytes. > > Bootstrapped/regtested on powerpc64le-linux, ok for trunk? > 2024-06-17 Jakub Jelinek > > PR target/115324 > * config/rs6000/rs6000-gen-builtins.cc (write_decls): Change > declaration of rs6000_init_generated_builtins from no arguments > to 4 pointer arguments. > (write_init_bif_table): Change rs6000_builtin_info_fntype to > builtin_info_fntype and rs6000_builtin_decls to builtin_decls. > (write_init_ovld_table): Change rs6000_instance_info_fntype to > instance_info_fntype, rs6000_builtin_decls to builtin_decls and > rs6000_overload_info to overload_info. > (write_init_file): Add __noipa__ attribute to > rs6000_init_generated_builtins for GCC 8.1+ and change the function > from no arguments to 4 pointer arguments. Change rs6000_builtin_decls > to builtin_decls. > * config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Adjust > rs6000_init_generated_builtins caller. It would have been much easier to review if you had done the renaming in a separate patch :-) You typically notice such things when writing the changelog is much harder than expected, and this is the True Value of changelogs! Seen from the other side, when reviewing a patch I like to start with the changelog (after the commit message), it should tell everything there is to know, and then if something in the actiual patch surprises me, something is not ideal, or wrong even. > + /* The reason to pass pointers to the function instead of accessing > + the rs6000_{{builtin,instance}_info_fntype,overload_info,builtin_decls} > + arrays directly is to decrease size of the already large function and > + noipa prevents the compiler with LTO to undo that optimization. */ Some of these array names no longer have the rs6000_ prefix now. Oh wait, you already took that into account? I'm not saying anything :-) The patch is fine for trunk, thank you! If you want backports those are okay, too (but I don't think you want any? Or does this work withput the previous patches as well?) Segher
Re: [PATCH] rs6000: ROP - Do not disable shrink-wrapping for leaf functions [PR114759]
Hi! On Mon, Jun 17, 2024 at 05:26:39PM -0500, Peter Bergner wrote: > While auditing our ROP code generation for some test cases I wrote, I noticed > a few issues which I'm tracking in PR114759. The first issue I noticed is we > disable shrink-wrapping when using -mrop-protect, even in the cases where we > never emit the ROP instructions because they're not needed. Please don't call this "ROP instructions". -mrop-protect tries to make it much harder to succesfully do exploits in a style called "return- oriented programming", starting from a stack overwrite normally. It does this by hashing the return address together with the stack pointer value and with the previous hash value (so the whole call stack hashed), and checking that before returning. "ROP insns" are the instructions used in such exploits, not what you mean here :-) The instructions are called "hash*"C, so maybe call tbem "hash insns" or "ROP protect hash insns"?. > The problem is > we disable shrink-wrapping too early, before we know whether we will need to > emit the ROP instructions or not. The fix is to delay disabling shrink > wrapping until we've decided whether we will or won't be emitting the ROP > instructions. > * config/rs6000/rs6000.cc (rs6000_override_options_after_change): Move > the disabling of shrink-wrapping from here > * config/rs6000/rs6000-logue.cc (rs6000_stack_info): ...to here. Hrm. Can you do it in some particular caller of rs6000_stack_info, instead? The rs6000_stack_info function itself is not suppposed to change any state whatsoever. > --- a/gcc/config/rs6000/rs6000-logue.cc > +++ b/gcc/config/rs6000/rs6000-logue.cc > @@ -720,7 +720,11 @@ rs6000_stack_info (void) >&& info->calls_p >&& DEFAULT_ABI == ABI_ELFv2 >&& rs6000_rop_protect) > -info->rop_hash_size = 8; > +{ > + /* If we are inserting ROP-protect instructions, disable shrink wrap. > */ > + flag_shrink_wrap = 0; > + info->rop_hash_size = 8; > +} The comment should say *why*! The fact that we do is clear from the code itself already. But why do we want this? > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -3427,10 +3427,6 @@ rs6000_override_options_after_change (void) > } >else if (!OPTION_SET_P (flag_cunroll_grow_size)) > flag_cunroll_grow_size = flag_peel_loops || optimize >= 3; > - > - /* If we are inserting ROP-protect instructions, disable shrink wrap. */ > - if (rs6000_rop_protect) > -flag_shrink_wrap = 0; > } (Yes, I know the original code didn't say either, but let's try to make things better :-) ) > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr114759-1.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power10 -mrop-protect > -fdump-rtl-pro_and_epilogue" } */ > +/* { dg-require-effective-target rop_ok } */ Do you want rop_ok while you are *forcing* it to be okay anyway? Why? Segher
Re: [PATCH] rs6000: Compute rop_hash_save_offset for non-Altivec compiles [PR115389]
On 6/16/24 9:40 PM, Kewen.Lin wrote: > on 2024/6/17 10:31, Peter Bergner wrote: >> On 6/16/24 9:10 PM, Kewen.Lin wrote: >>> on 2024/6/15 01:05, Peter Bergner wrote: That said, the --with-cpu=power5 build without fortran did bootstrap and regtest with no regressions, so the build did test that code path and exposed no problems. >>> >>> OK, nice! Thanks! >> >> I assume this means you're "OK" with the updated patch, correct? > > Yes, OK for trunk, thanks! Thanks. We will need backports to GCC 11, as it is broken back to when ROP was first added then. I'll let things burn-in on trunk for a couple of days so Bill's CI builders have a chance to test it on all of our configs. >> Do you want to take a stab at writing that or do you want me to do that? > > Either is fine for me, then let me give it a shot. Sounds good, thanks. That will allow me to handle the other ROP issues I came across, which are reported in PR114759. Peter
Re: [PATCH V3 1/2] RISC-V: Fix vwsll combine on rv32 targets
On 6/17/24 12:33 PM, Edwin Lu wrote: On rv32 targets, vwsll_zext1_scalar_ would trigger an ice in maybe_legitimize_instruction when zero extending a uint32 to uint64 due to a mismatch between the input operand's mode (DI) and the expanded insn operand's mode (Pmode == SI). Ensure that mode of the operands match Tested on rv32/64 gcv newlib. Letting CI perform additional testing gcc/ChangeLog: * config/riscv/autovec-opt.md: Fix mode mismatch OK jeff
Re: [PATCH V3 2/2] RISC-V: Move mode assertion out of conditional branch in emit_insn
On 6/17/24 12:33 PM, Edwin Lu wrote: When emitting insns, we have an early assertion to ensure the input operand's mode and the expanded operand's mode are the same; however, it does not perform this check if the pattern does not have an explicit machine mode specifying the operand. In this scenario, it will always assume that mode = Pmode to correctly satisfy the maybe_legitimize_operand check, however, there may be problems when working in 32 bit environments. Make the assert unconditional and replace it with an internal error for more descriptive logging gcc/ChangeLog: * config/riscv/riscv-v.cc: Move assert out of conditional block OK. Jeff
[COMMITTED] aarch64: Add testcase for PR97405
This aarch64 sve specific code was fixed by r15-917-gc9842f99042454 which added a riscv specific testcase so adding an aarch64 one to test the fix does not regress is a good idea. Committed as obvious after testing the testcase for aarch64-linux-gnu. PR tree-optimization/97405 gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/pr97405-1.c: New test. Signed-off-by: Andrew Pinski --- gcc/testsuite/gcc.target/aarch64/sve/pr97405-1.c | 13 + 1 file changed, 13 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr97405-1.c diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr97405-1.c b/gcc/testsuite/gcc.target/aarch64/sve/pr97405-1.c new file mode 100644 index 000..5efa32c9928 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr97405-1.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-march=armv8.2-a+sve -O2" } +/* PR tree-optimization/97405 */ +#include "arm_sve.h" + +void +a (svuint8x3_t b, unsigned char *p, int c) { + if (c) +svst1_u8(svptrue_pat_b8(SV_VL16), p, svget3_u8(b, 1)); + else +svst1_u8(svwhilelt_b8(6, 6), p, svget3_u8(b, 1)); +} + -- 2.43.0
[RFC v3] RISC-V: Promote Zaamo/Zalrsc to a when using an old binutils
Binutils 2.42 and before don't support Zaamo/Zalrsc. Promote Zaamo/Zalrsc to 'a' in the -march string when assembling. This change respects Zaamo/Zalrsc when generating code. Testcases that check for the default isa string will fail with the old binutils since zaamo/zalrsc aren't emitted anymore. All other Zaamo/Zalrsc testcases pass. gcc/ChangeLog: * common/config/riscv/riscv-common.cc (riscv_subset_list::to_string): Add toggle to promote Zaamo/Zalrsc extensions to 'a'. (riscv_arch_str): Ditto. (riscv_expand_arch): Ditto. (riscv_expand_arch_from_cpu): Ditto. (riscv_expand_arch_upgrade_exts): New function. Wrapper around riscv_expand_arch to preserve the function signature. (riscv_expand_arch_no_upgrade_exts): Ditto (riscv_expand_arch_from_cpu_upgrade_exts): New function. Wrapper around riscv_expand_arch_from_cpu to preserve the function signature. (riscv_expand_arch_from_cpu_no_upgrade_exts): Ditto. * config/riscv/riscv-protos.h (riscv_arch_str): Add toggle to function prototype. * config/riscv/riscv-subset.h: Ditto. * config/riscv/riscv-target-attr.cc (riscv_process_target_attr): * config/riscv/riscv.cc (riscv_emit_attribute): (riscv_declare_function_name): * config/riscv/riscv.h (riscv_expand_arch): Remove. (riscv_expand_arch_from_cpu): Ditto. (riscv_expand_arch_upgrade_exts): Add toggle wrapper functions. (riscv_expand_arch_no_upgrade_exts): Ditto. (riscv_expand_arch_from_cpu_upgrade_exts): Ditto. (riscv_expand_arch_from_cpu_no_upgrade_exts): Ditto. (EXTRA_SPEC_FUNCTIONS): Ditto. (OPTION_DEFAULT_SPECS): Use non-upgraded march string when invoking the compiler. (ASM_SPEC): Use upgraded march string when invoking the assembler. Signed-off-by: Patrick O'Neill --- v3 ChangeLog: Rebased on non-promoting patch. Wrap all Zaamo/Zalrsc upgrade code in #ifndef to prevent compiler warnings about unused/potentially undefined variables. Silence unused parameter warning with a voidcast. --- RFC since I'm not sure if this upgrade behavior is more trouble than it's worth - this is a pretty invasive change. Happy to iterate further or just drop these changes. --- gcc/common/config/riscv/riscv-common.cc | 111 +--- gcc/config/riscv/riscv-protos.h | 3 +- gcc/config/riscv/riscv-subset.h | 2 +- gcc/config/riscv/riscv-target-attr.cc | 4 +- gcc/config/riscv/riscv.cc | 7 +- gcc/config/riscv/riscv.h| 46 ++ 6 files changed, 137 insertions(+), 36 deletions(-) diff --git a/gcc/common/config/riscv/riscv-common.cc b/gcc/common/config/riscv/riscv-common.cc index 1dc1d9904c7..05c26f73b73 100644 --- a/gcc/common/config/riscv/riscv-common.cc +++ b/gcc/common/config/riscv/riscv-common.cc @@ -907,7 +907,7 @@ riscv_subset_list::add (const char *subset, bool implied_p) VERSION_P to determine append version info or not. */ std::string -riscv_subset_list::to_string (bool version_p) const +riscv_subset_list::to_string (bool version_p, bool upgrade_exts) const { std::ostringstream oss; oss << "rv" << m_xlen; @@ -916,10 +916,17 @@ riscv_subset_list::to_string (bool version_p) const riscv_subset_t *subset; bool skip_zifencei = false; - bool skip_zaamo_zalrsc = false; bool skip_zicsr = false; bool i2p0 = false; +#ifndef HAVE_AS_MARCH_ZAAMO_ZALRSC + bool upgrade_zaamo_zalrsc = false; + bool has_a_ext = false; + bool insert_a_ext = false; + bool inserted_a_ext = false; + riscv_subset_t *a_subset; +#endif + /* For RISC-V ISA version 2.2 or earlier version, zicsr and zifencei is included in the base ISA. */ if (riscv_isa_spec == ISA_SPEC_CLASS_2P2) @@ -945,8 +952,33 @@ riscv_subset_list::to_string (bool version_p) const skip_zifencei = true; #endif #ifndef HAVE_AS_MARCH_ZAAMO_ZALRSC - /* Skip since binutils 2.42 and earlier don't recognize zaamo/zalrsc. */ - skip_zaamo_zalrsc = true; + /* Upgrade Zaamo/Zalrsc extensions to 'a' since binutils 2.42 and earlier + don't recognize zaamo/zalrsc. */ + upgrade_zaamo_zalrsc = upgrade_exts; + if (upgrade_zaamo_zalrsc) +{ + for (subset = m_head; subset != NULL; subset = subset->next) + { + if (subset->name == "a") + has_a_ext = true; + if (subset->name == "zaamo" || subset->name == "zalrsc") + insert_a_ext = true; + } + if (insert_a_ext && !has_a_ext) + { + unsigned int major_version = 0, minor_version = 0; + get_default_version ("a", &major_version, &minor_version); + a_subset = new riscv_subset_t (); + a_subset->name = "a"; + a_subset->implied_p = false; + a_subset->major_version = major_version; + a_subset->minor_version = minor_version; + } +} +#else + /* Silence unused parameter warning when HAV
RE: [PATCH] aarch64: Add fix_truncv4sfv4hi2 pattern [PR113882]
> Pengxuan Zheng writes: > > This patch adds the fix_truncv4sfv4hi2 (V4SF->V4HI) pattern which is > > implemented using fix_truncv4sfv4si2 (V4SF->V4SI) and then truncv4siv4hi2 > (V4SI->V4HI). > > > > PR target/113882 > > > > gcc/ChangeLog: > > > > * config/aarch64/aarch64-simd.md (fix_truncv4sfv4hi2): New pattern. > > Could we handle this by extending the target-independent code instead? > Richard mentioned in comment 1 that the current set of intermediate > conversions is hard-coded, but it didn't sound like he was implying that the > set shouldn't change. Yes, Richard. I checked the target-independent code. In fact, SLP already handles this type of intermediate conversions. However, the logic is guarded by "!flag_trapping_math". Therefore, if we pass -fno-trapping-math , SLP actually generates the right vectorized code. Also, looks like the check for "!flag_trapping_math" was added intentionally in r14-2085-g77a50c772771f6 to fix some PRs. So, I'm not sure what we should do here. Thoughts? if (GET_MODE_SIZE (lhs_mode) != GET_MODE_SIZE (rhs_mode) && (code == FLOAT_EXPR || (code == FIX_TRUNC_EXPR && !flag_trapping_math))) Thanks, Pengxuan > > Thanks, > Richard > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/aarch64/fix_trunc2.c: New test. > > > > Signed-off-by: Pengxuan Zheng > > --- > > gcc/config/aarch64/aarch64-simd.md| 13 + > > gcc/testsuite/gcc.target/aarch64/fix_trunc2.c | 14 ++ > > 2 files changed, 27 insertions(+) > > create mode 100644 gcc/testsuite/gcc.target/aarch64/fix_trunc2.c > > > > diff --git a/gcc/config/aarch64/aarch64-simd.md > > b/gcc/config/aarch64/aarch64-simd.md > > index 868f4486218..096f7b56a27 100644 > > --- a/gcc/config/aarch64/aarch64-simd.md > > +++ b/gcc/config/aarch64/aarch64-simd.md > > @@ -3032,6 +3032,19 @@ (define_expand > "2" > >"TARGET_SIMD" > >{}) > > > > + > > +(define_expand "fix_truncv4sfv4hi2" > > + [(match_operand:V4HI 0 "register_operand") > > + (match_operand:V4SF 1 "register_operand")] > > + "TARGET_SIMD" > > + { > > +rtx tmp = gen_reg_rtx (V4SImode); > > +emit_insn (gen_fix_truncv4sfv4si2 (tmp, operands[1])); > > +emit_insn (gen_truncv4siv4hi2 (operands[0], tmp)); > > +DONE; > > + } > > +) > > + > > (define_expand "ftrunc2" > >[(set (match_operand:VHSDF 0 "register_operand") > > (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")] diff > > --git a/gcc/testsuite/gcc.target/aarch64/fix_trunc2.c > > b/gcc/testsuite/gcc.target/aarch64/fix_trunc2.c > > new file mode 100644 > > index 000..57cc00913a3 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/fix_trunc2.c > > @@ -0,0 +1,14 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2" } */ > > + > > +void > > +f (short *__restrict a, float *__restrict b) { > > + a[0] = b[0]; > > + a[1] = b[1]; > > + a[2] = b[2]; > > + a[3] = b[3]; > > +} > > + > > +/* { dg-final { scan-assembler-times {fcvtzs\tv[0-9]+.4s, v[0-9]+.4s} > > +1 } } */ > > +/* { dg-final { scan-assembler-times {xtn\tv[0-9]+.4h, v[0-9]+.4s} 1 > > +} } */
[PATCH] function.h: eliminate macros "dom_computed" and "n_bbs_in_dom_tree"
Be explicit when we use "cfun". No functional change intended. Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. OK for trunk? gcc/ChangeLog: * dominance.cc (compute_dom_fast_query): Replace uses of "dom_computed" macro with explicit use of cfun. (compute_dom_fast_query_in_region): Likewise. (calculate_dominance_info): Likewise, also for macro "n_bbs_in_dom_tree". (calculate_dominance_info_for_region): Likewise for "dom_computed" macro. (get_immediate_dominator): Likewise. (set_immediate_dominator): Likewise. (get_dominated_by): Likewise. (redirect_immediate_dominators): Likewise. (nearest_common_dominator): Likewise. (dominated_by_p): Likewise. (bb_dom_dfs_in): Likewise. (bb_dom_dfs_out): Likewise. (recompute_dominator): Likewise. (iterate_fix_dominators): Likewise. (add_to_dominance_info): Likewise, also for macro "n_bbs_in_dom_tree". (delete_from_dominance_info): Likewise. (set_dom_info_availability): Likewise for "dom_computed" macro. * function.h (dom_computed): Delete macro. (n_bbs_in_dom_tree): Delete macro. Signed-off-by: David Malcolm --- gcc/dominance.cc | 70 +--- gcc/function.h | 3 --- 2 files changed, 36 insertions(+), 37 deletions(-) diff --git a/gcc/dominance.cc b/gcc/dominance.cc index 0357210ed27f..528b38caa9db 100644 --- a/gcc/dominance.cc +++ b/gcc/dominance.cc @@ -672,7 +672,7 @@ compute_dom_fast_query (enum cdi_direction dir) gcc_checking_assert (dom_info_available_p (dir)); - if (dom_computed[dir_index] == DOM_OK) + if (cfun->cfg->x_dom_computed[dir_index] == DOM_OK) return; FOR_ALL_BB_FN (bb, cfun) @@ -681,7 +681,7 @@ compute_dom_fast_query (enum cdi_direction dir) assign_dfs_numbers (bb->dom[dir_index], &num); } - dom_computed[dir_index] = DOM_OK; + cfun->cfg->x_dom_computed[dir_index] = DOM_OK; } /* Analogous to the previous function but compute the data for reducible @@ -697,7 +697,7 @@ compute_dom_fast_query_in_region (enum cdi_direction dir, gcc_checking_assert (dom_info_available_p (dir)); - if (dom_computed[dir_index] == DOM_OK) + if (cfun->cfg->x_dom_computed[dir_index] == DOM_OK) return; /* Assign dfs numbers for region nodes except for entry and exit nodes. */ @@ -708,7 +708,7 @@ compute_dom_fast_query_in_region (enum cdi_direction dir, assign_dfs_numbers (bb->dom[dir_index], &num); } - dom_computed[dir_index] = DOM_OK; + cfun->cfg->x_dom_computed[dir_index] = DOM_OK; } /* The main entry point into this module. DIR is set depending on whether @@ -721,7 +721,7 @@ calculate_dominance_info (cdi_direction dir, bool compute_fast_query) { unsigned int dir_index = dom_convert_dir_to_idx (dir); - if (dom_computed[dir_index] == DOM_OK) + if (cfun->cfg->x_dom_computed[dir_index] == DOM_OK) { checking_verify_dominators (dir); return; @@ -730,14 +730,14 @@ calculate_dominance_info (cdi_direction dir, bool compute_fast_query) timevar_push (TV_DOMINANCE); if (!dom_info_available_p (dir)) { - gcc_assert (!n_bbs_in_dom_tree[dir_index]); + gcc_assert (!cfun->cfg->x_n_bbs_in_dom_tree[dir_index]); basic_block b; FOR_ALL_BB_FN (b, cfun) { b->dom[dir_index] = et_new_tree (b); } - n_bbs_in_dom_tree[dir_index] = n_basic_blocks_for_fn (cfun); + cfun->cfg->x_n_bbs_in_dom_tree[dir_index] = n_basic_blocks_for_fn (cfun); dom_info di (cfun, dir); di.calc_dfs_tree (); @@ -749,7 +749,7 @@ calculate_dominance_info (cdi_direction dir, bool compute_fast_query) et_set_father (b->dom[dir_index], d->dom[dir_index]); } - dom_computed[dir_index] = DOM_NO_FAST_QUERY; + cfun->cfg->x_dom_computed[dir_index] = DOM_NO_FAST_QUERY; } else checking_verify_dominators (dir); @@ -772,7 +772,7 @@ calculate_dominance_info_for_region (cdi_direction dir, basic_block bb; unsigned int i; - if (dom_computed[dir_index] == DOM_OK) + if (cfun->cfg->x_dom_computed[dir_index] == DOM_OK) return; timevar_push (TV_DOMINANCE); @@ -791,7 +791,7 @@ calculate_dominance_info_for_region (cdi_direction dir, if (basic_block d = di.get_idom (bb)) et_set_father (bb->dom[dir_index], d->dom[dir_index]); - dom_computed[dir_index] = DOM_NO_FAST_QUERY; + cfun->cfg->x_dom_computed[dir_index] = DOM_NO_FAST_QUERY; compute_dom_fast_query_in_region (dir, region); timevar_pop (TV_DOMINANCE); @@ -858,7 +858,7 @@ get_immediate_dominator (enum cdi_direction dir, basic_block bb) unsigned int dir_index = dom_convert_dir_to_idx (dir); struct et_node *node = bb->dom[dir_index]; - gcc_checking_assert (dom_computed[dir_index]); + gcc_checking_assert (cfun->cfg->x_dom_computed[dir_index]); if (!node->father)
[PATCH 00/11] CodeView variables and type system
This patch series adds support for outputting global variables when the -gcodeview option is provided, along with the type system to go along with this. As with previous patches, the best way to see the output is run Microsoft's cvdump.exe against the object file: https://github.com/microsoft/microsoft-pdb/raw/master/cvdump/cvdump.exe You'll also need a recentish version of binutils in order to get ld to output an actual PDB file that can be read by MSVC or windbg. This ought to be fairly complete as far as C is concerned. Still to come are functions, local variables, and some C++ things. Mark Harmstone (11): Output CodeView data about variables Handle CodeView base types Handle typedefs for CodeView Handle pointers for CodeView Handle const and varible modifiers for CodeView Handle enums for CodeView Handle structs and classes for CodeView Handle unions for CodeView. Handle arrays for CodeView Handle bitfields for CodeView Handle subroutine types in CodeView gcc/dwarf2codeview.cc | 2278 - gcc/dwarf2codeview.h | 67 ++ gcc/dwarf2out.cc |5 + 3 files changed, 2341 insertions(+), 9 deletions(-) -- 2.44.2
[PATCH 04/11] Handle pointers for CodeView
Translates DW_TAG_pointer_type DIEs into LF_POINTER symbols, which get output into the .debug$T section. gcc/ * dwarf2codeview.cc (FIRST_TYPE): Define. (struct codeview_custom_type): New structure. (custom_types, last_custom_type): New variables. (get_type_num): Prototype. (write_lf_pointer, write_custom_types): New functions. (codeview_debug_finish): Call write_custom_types. (add_custom_type, get_type_num_pointer_type): New functions. (get_type_num): Handle DW_TAG_pointer_type DIEs. * dwarf2codeview.h (T_VOID): Define. (CV_POINTER_32, CV_POINTER_64): Likewise. (T_32PVOID, T_64PVOID): Likewise. (CV_PTR_NEAR32, CV_PTR64, LF_POINTER): Likewise. --- gcc/dwarf2codeview.cc | 179 +- gcc/dwarf2codeview.h | 13 +++ 2 files changed, 188 insertions(+), 4 deletions(-) diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc index 5006a176260..51401f2d5bc 100644 --- a/gcc/dwarf2codeview.cc +++ b/gcc/dwarf2codeview.cc @@ -56,6 +56,8 @@ along with GCC; see the file COPYING3. If not see #define CV_CFL_C 0x00 #define CV_CFL_CXX 0x01 +#define FIRST_TYPE 0x1000 + #define LINE_LABEL "Lcvline" #define END_FUNC_LABEL "Lcvendfunc" #define SYMBOL_START_LABEL "Lcvsymstart" @@ -168,6 +170,22 @@ struct die_hasher : free_ptr_hash } }; +struct codeview_custom_type +{ + struct codeview_custom_type *next; + uint32_t num; + uint16_t kind; + + union + { +struct +{ + uint32_t base_type; + uint32_t attributes; +} lf_pointer; + }; +}; + static unsigned int line_label_num; static unsigned int func_label_num; static unsigned int sym_label_num; @@ -181,6 +199,9 @@ static const char* last_filename; static uint32_t last_file_id; static codeview_symbol *sym, *last_sym; static hash_table *types_htab; +static codeview_custom_type *custom_types, *last_custom_type; + +static uint32_t get_type_num (dw_die_ref type); /* Record new line number against the current function. */ @@ -845,6 +866,71 @@ write_codeview_symbols (void) asm_fprintf (asm_out_file, "%LLcv_syms_end:\n"); } +/* Write an LF_POINTER type. */ + +static void +write_lf_pointer (codeview_custom_type *t) +{ + /* This is lf_pointer in binutils and lfPointer in Microsoft's cvinfo.h: + +struct lf_pointer +{ + uint16_t size; + uint16_t kind; + uint32_t base_type; + uint32_t attributes; +} ATTRIBUTE_PACKED; + */ + + fputs (integer_asm_op (2, false), asm_out_file); + asm_fprintf (asm_out_file, "%LLcv_type%x_end - %LLcv_type%x_start\n", + t->num, t->num); + + asm_fprintf (asm_out_file, "%LLcv_type%x_start:\n", t->num); + + fputs (integer_asm_op (2, false), asm_out_file); + fprint_whex (asm_out_file, t->kind); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (4, false), asm_out_file); + fprint_whex (asm_out_file, t->lf_pointer.base_type); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (4, false), asm_out_file); + fprint_whex (asm_out_file, t->lf_pointer.attributes); + putc ('\n', asm_out_file); + + asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num); +} + +/* Write the .debug$T section, which contains all of our custom type + definitions. */ + +static void +write_custom_types (void) +{ + targetm.asm_out.named_section (".debug$T", SECTION_DEBUG, NULL); + + fputs (integer_asm_op (4, false), asm_out_file); + fprint_whex (asm_out_file, CV_SIGNATURE_C13); + putc ('\n', asm_out_file); + + while (custom_types) +{ + codeview_custom_type *n = custom_types->next; + + switch (custom_types->kind) + { + case LF_POINTER: + write_lf_pointer (custom_types); + break; + } + + free (custom_types); + custom_types = n; +} +} + /* Finish CodeView debug info emission. */ void @@ -861,6 +947,9 @@ codeview_debug_finish (void) write_line_numbers (); write_codeview_symbols (); + if (custom_types) +write_custom_types (); + if (types_htab) delete types_htab; } @@ -993,10 +1082,88 @@ get_type_num_base_type (dw_die_ref type) } } -/* Process a DIE representing a type definition and return its number. If - it's something we can't handle, return 0. We keep a hash table so that - we're not adding the same type multiple times - though if we do it's not - disastrous, as ld will deduplicate everything for us. */ +/* Add a new codeview_custom_type to our singly-linked custom_types list. */ + +static void +add_custom_type (codeview_custom_type *ct) +{ + uint32_t num; + + if (last_custom_type) +{ + num = last_custom_type->num + 1; + last_custom_type->next = ct; +} + else +{ + num = FIRST_TYPE; + custom_types = ct; +} + + last_custom_type = ct; + + ct->num = num; +} + +/* Process a DW_TAG_pointer_type DIE. If
[PATCH 01/11] Output CodeView data about variables
Parse the DW_TAG_variable DIEs, and outputs S_GDATA32 (for global variables) and S_LDATA32 (static global variables) symbols into the .debug$S section. gcc/ * dwarf2codeview.cc (S_LDATA32, S_GDATA32): Define. (struct codeview_symbol): New structure. (sym, last_sym): New variables. (write_data_symbol): New function. (write_codeview_symbols): Call write_data_symbol. (add_variable, codeview_debug_early_finish): New functions. * dwarf2codeview.h (codeview_debug_early_finish): Prototype. * dwarf2out.cc (dwarf2out_early_finish): Call codeview_debug_early_finish. --- gcc/dwarf2codeview.cc | 160 ++ gcc/dwarf2codeview.h | 1 + gcc/dwarf2out.cc | 5 ++ 3 files changed, 166 insertions(+) diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc index db776d79be4..60e84635971 100644 --- a/gcc/dwarf2codeview.cc +++ b/gcc/dwarf2codeview.cc @@ -46,6 +46,8 @@ along with GCC; see the file COPYING3. If not see #define CHKSUM_TYPE_MD51 +#define S_LDATA32 0x110c +#define S_GDATA32 0x110d #define S_COMPILE3 0x113c #define CV_CFL_80386 0x03 @@ -129,6 +131,22 @@ struct codeview_function codeview_line_block *blocks, *last_block; }; +struct codeview_symbol +{ + codeview_symbol *next; + uint16_t kind; + + union + { +struct +{ + uint32_t type; + char *name; + dw_die_ref die; +} data_symbol; + }; +}; + static unsigned int line_label_num; static unsigned int func_label_num; static unsigned int sym_label_num; @@ -140,6 +158,7 @@ static codeview_string *strings, *last_string; static codeview_function *funcs, *last_func; static const char* last_filename; static uint32_t last_file_id; +static codeview_symbol *sym, *last_sym; /* Record new line number against the current function. */ @@ -698,6 +717,77 @@ write_compile3_symbol (void) targetm.asm_out.internal_label (asm_out_file, SYMBOL_END_LABEL, label_num); } +/* Write an S_GDATA32 symbol, representing a global variable, or an S_LDATA32 + symbol, for a static global variable. */ + +static void +write_data_symbol (codeview_symbol *s) +{ + unsigned int label_num = ++sym_label_num; + dw_attr_node *loc; + dw_loc_descr_ref loc_ref; + + /* This is struct datasym in binutils: + + struct datasym + { + uint16_t size; + uint16_t kind; + uint32_t type; + uint32_t offset; + uint16_t section; + char name[]; + } ATTRIBUTE_PACKED; + */ + + /* Extract the DW_AT_location attribute from the DIE, and make sure it's in + in a format we can parse. */ + + loc = get_AT (s->data_symbol.die, DW_AT_location); + if (!loc) +goto end; + + if (loc->dw_attr_val.val_class != dw_val_class_loc) +goto end; + + loc_ref = loc->dw_attr_val.v.val_loc; + if (!loc_ref || loc_ref->dw_loc_opc != DW_OP_addr) +goto end; + + /* Output the S_GDATA32 / S_LDATA32 record. */ + + fputs (integer_asm_op (2, false), asm_out_file); + asm_fprintf (asm_out_file, + "%L" SYMBOL_END_LABEL "%u - %L" SYMBOL_START_LABEL "%u\n", + label_num, label_num); + + targetm.asm_out.internal_label (asm_out_file, SYMBOL_START_LABEL, label_num); + + fputs (integer_asm_op (2, false), asm_out_file); + fprint_whex (asm_out_file, s->kind); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (4, false), asm_out_file); + fprint_whex (asm_out_file, s->data_symbol.type); + putc ('\n', asm_out_file); + + asm_fprintf (asm_out_file, "\t.secrel32 "); + output_addr_const (asm_out_file, loc_ref->dw_loc_oprnd1.v.val_addr); + fputc ('\n', asm_out_file); + + asm_fprintf (asm_out_file, "\t.secidx "); + output_addr_const (asm_out_file, loc_ref->dw_loc_oprnd1.v.val_addr); + fputc ('\n', asm_out_file); + + ASM_OUTPUT_ASCII (asm_out_file, s->data_symbol.name, + strlen (s->data_symbol.name) + 1); + + targetm.asm_out.internal_label (asm_out_file, SYMBOL_END_LABEL, label_num); + +end: + free (s->data_symbol.name); +} + /* Write the CodeView symbols into the .debug$S section. */ static void @@ -714,6 +804,22 @@ write_codeview_symbols (void) write_compile3_symbol (); + while (sym) +{ + codeview_symbol *n = sym->next; + + switch (sym->kind) + { + case S_LDATA32: + case S_GDATA32: + write_data_symbol (sym); + break; + } + + free (sym); + sym = n; +} + asm_fprintf (asm_out_file, "%LLcv_syms_end:\n"); } @@ -734,4 +840,58 @@ codeview_debug_finish (void) write_codeview_symbols (); } +/* Process a DW_TAG_variable DIE, and add an S_GDATA32 or S_LDATA32 symbol for + this. */ + +static void +add_variable (dw_die_ref die) +{ + codeview_symbol *s; + const char *name; + + name = get_AT_string (die, DW_AT_name); + if (!name) +return; + + s = (codeview_symbol *) xm
[PATCH 07/11] Handle structs and classes for CodeView
Translates DW_TAG_structure_type DIEs into LF_STRUCTURE symbols, and DW_TAG_class_type DIEs into LF_CLASS symbols. gcc/ * dwarf2codeview.cc (struct codeview_type): Add is_fwd_ref member. (struct codeview_subtype): Add lf_member to union. (struct codeview_custom_type): Add lf_structure to union. (struct codeview_deferred_type): New structure. (deferred_types, last_deferred_type): New variables. (get_type_num): Add new args to prototype. (write_lf_fieldlist): Handle LF_MEMBER subtypes. (write_lf_structure): New function. (write_custom_types): Call write_lf_structure. (get_type_num_pointer_type): Add in_struct argument. (get_type_num_const_type): Likewise. (get_type_num_volatile_type): Likewise. (add_enum_forward_def): Fix get_type_num call. (get_type_num_enumeration_type): Add in-struct argument. (add_deferred_type, flush_deferred_types): New functions. (add_struct_forward_def, get_type_num_struct): Likewise. (get_type_num): Handle self-referential structs. (add_variable): Fix get_type_num call. (codeview_debug_early_finish): Call flush_deferred_types. * dwarf2codeview.h (LF_CLASS, LF_STRUCTURE, LF_MEMBER): Define. --- gcc/dwarf2codeview.cc | 513 -- gcc/dwarf2codeview.h | 3 + 2 files changed, 493 insertions(+), 23 deletions(-) diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc index 475a53573e9..9c6614f6297 100644 --- a/gcc/dwarf2codeview.cc +++ b/gcc/dwarf2codeview.cc @@ -158,6 +158,7 @@ struct codeview_type { dw_die_ref die; uint32_t num; + bool is_fwd_ref; }; struct die_hasher : free_ptr_hash @@ -197,6 +198,13 @@ struct codeview_subtype { uint32_t type_num; } lf_index; +struct +{ + uint16_t attributes; + uint32_t type; + codeview_integer offset; + char *name; +} lf_member; }; }; @@ -232,9 +240,25 @@ struct codeview_custom_type uint32_t fieldlist; char *name; } lf_enum; +struct +{ + uint16_t num_members; + uint16_t properties; + uint32_t field_list; + uint32_t derived_from; + uint32_t vshape; + codeview_integer length; + char *name; +} lf_structure; }; }; +struct codeview_deferred_type +{ + struct codeview_deferred_type *next; + dw_die_ref type; +}; + static unsigned int line_label_num; static unsigned int func_label_num; static unsigned int sym_label_num; @@ -249,8 +273,9 @@ static uint32_t last_file_id; static codeview_symbol *sym, *last_sym; static hash_table *types_htab; static codeview_custom_type *custom_types, *last_custom_type; +static codeview_deferred_type *deferred_types, *last_deferred_type; -static uint32_t get_type_num (dw_die_ref type); +static uint32_t get_type_num (dw_die_ref type, bool in_struct, bool no_fwd_ref); /* Record new line number against the current function. */ @@ -1217,6 +1242,51 @@ write_lf_fieldlist (codeview_custom_type *t) free (v->lf_enumerate.name); break; + case LF_MEMBER: + /* This is lf_member in binutils and lfMember in Microsoft's +cvinfo.h: + + struct lf_member + { + uint16_t kind; + uint16_t attributes; + uint32_t type; + uint16_t offset; + char name[]; + } ATTRIBUTE_PACKED; + */ + + fputs (integer_asm_op (2, false), asm_out_file); + fprint_whex (asm_out_file, LF_MEMBER); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (2, false), asm_out_file); + fprint_whex (asm_out_file, v->lf_member.attributes); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (4, false), asm_out_file); + fprint_whex (asm_out_file, v->lf_member.type); + putc ('\n', asm_out_file); + + leaf_len = 8 + write_cv_integer (&v->lf_member.offset); + + if (v->lf_member.name) + { + name_len = strlen (v->lf_member.name) + 1; + ASM_OUTPUT_ASCII (asm_out_file, v->lf_member.name, name_len); + } + else + { + name_len = 1; + ASM_OUTPUT_ASCII (asm_out_file, "", name_len); + } + + leaf_len += name_len; + write_cv_padding (4 - (leaf_len % 4)); + + free (v->lf_member.name); + break; + case LF_INDEX: /* This is lf_index in binutils and lfIndex in Microsoft's cvinfo.h: @@ -1308,6 +1378,82 @@ write_lf_enum (codeview_custom_type *t) asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num); } +/* Write an LF_STRUCTURE or LF_CLASS type (the two have the same structure). */ + +static void +write_lf_structure (codeview_custom_type *t) +{ + size_t name_len, leaf_len
[PATCH 05/11] Handle const and varible modifiers for CodeView
Translate DW_TAG_const_type and DW_TAG_volatile_type DIEs into LF_MODIFIER symbols. gcc/ * dwarf2codeview.cc (struct codeview_custom_type): Add lf_modifier to union. (write_cv_padding, write_lf_modifier): New functions. (write_custom_types): Call write_lf_modifier. (get_type_num_const_type): New function. (get_type_num_volatile_type): Likewise. (get_type_num): Handle DW_TAG_const_type and DW_TAG_volatile_type DIEs. * dwarf2codeview.h (MOD_const, MOD_volatile): Define. (LF_MODIFIER): Likewise. --- gcc/dwarf2codeview.cc | 157 ++ gcc/dwarf2codeview.h | 5 ++ 2 files changed, 162 insertions(+) diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc index 51401f2d5bc..05f5f60997e 100644 --- a/gcc/dwarf2codeview.cc +++ b/gcc/dwarf2codeview.cc @@ -183,6 +183,11 @@ struct codeview_custom_type uint32_t base_type; uint32_t attributes; } lf_pointer; +struct +{ + uint32_t base_type; + uint16_t modifier; +} lf_modifier; }; }; @@ -903,6 +908,76 @@ write_lf_pointer (codeview_custom_type *t) asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num); } +/* All CodeView type definitions have to be aligned to a four-byte boundary, + so write some padding bytes if necessary. These have to be specific values: + f3, f2, f1. */ + +static void +write_cv_padding (size_t padding) +{ + if (padding == 4 || padding == 0) +return; + + if (padding == 3) +{ + fputs (integer_asm_op (1, false), asm_out_file); + fprint_whex (asm_out_file, 0xf3); + putc ('\n', asm_out_file); +} + + if (padding >= 2) +{ + fputs (integer_asm_op (1, false), asm_out_file); + fprint_whex (asm_out_file, 0xf2); + putc ('\n', asm_out_file); +} + + fputs (integer_asm_op (1, false), asm_out_file); + fprint_whex (asm_out_file, 0xf1); + putc ('\n', asm_out_file); +} + +/* Write an LF_MODIFIER type, representing a const and/or volatile modification + of another type. */ + +static void +write_lf_modifier (codeview_custom_type *t) +{ + /* This is lf_modifier in binutils and lfModifier in Microsoft's cvinfo.h: + +struct lf_modifier +{ + uint16_t size; + uint16_t kind; + uint32_t base_type; + uint16_t modifier; + uint16_t padding; +} ATTRIBUTE_PACKED; + */ + + fputs (integer_asm_op (2, false), asm_out_file); + asm_fprintf (asm_out_file, "%LLcv_type%x_end - %LLcv_type%x_start\n", + t->num, t->num); + + asm_fprintf (asm_out_file, "%LLcv_type%x_start:\n", t->num); + + fputs (integer_asm_op (2, false), asm_out_file); + fprint_whex (asm_out_file, t->kind); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (4, false), asm_out_file); + fprint_whex (asm_out_file, t->lf_modifier.base_type); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (2, false), asm_out_file); + fprint_whex (asm_out_file, t->lf_modifier.modifier); + putc ('\n', asm_out_file); + + write_cv_padding (2); + + asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num); +} + /* Write the .debug$T section, which contains all of our custom type definitions. */ @@ -924,6 +999,10 @@ write_custom_types (void) case LF_POINTER: write_lf_pointer (custom_types); break; + + case LF_MODIFIER: + write_lf_modifier (custom_types); + break; } free (custom_types); @@ -1159,6 +1238,76 @@ get_type_num_pointer_type (dw_die_ref type) return ct->num; } +/* Process a DW_TAG_const_type DIE, adding an LF_MODIFIER type and returning + its number. */ + +static uint32_t +get_type_num_const_type (dw_die_ref type) +{ + dw_die_ref base_type; + uint32_t base_type_num; + codeview_custom_type *ct; + bool is_volatile = false; + + base_type = get_AT_ref (type, DW_AT_type); + if (!base_type) +return 0; + + /* Handle case when this is a const volatile type - we only need one + LF_MODIFIER for this. */ + if (dw_get_die_tag (base_type) == DW_TAG_volatile_type) +{ + is_volatile = true; + + base_type = get_AT_ref (base_type, DW_AT_type); + if (!base_type) + return 0; +} + + base_type_num = get_type_num (base_type); + if (base_type_num == 0) +return 0; + + ct = (codeview_custom_type *) xmalloc (sizeof (codeview_custom_type)); + + ct->next = NULL; + ct->kind = LF_MODIFIER; + ct->lf_modifier.base_type = base_type_num; + ct->lf_modifier.modifier = MOD_const; + + if (is_volatile) +ct->lf_modifier.modifier |= MOD_volatile; + + add_custom_type (ct); + + return ct->num; +} + +/* Process a DW_TAG_volatile_type DIE, adding an LF_MODIFIER type and + returning its number. */ + +static uint32_t +get_type_num_volatile_type (dw_die_ref type) +{ + uint32_t base_type_num; + codeview_custom_type *ct; + + base_type_num = get_type_num (get_AT_ref (type, DW_AT_ty
[PATCH 10/11] Handle bitfields for CodeView
Translates structure members with DW_AT_data_bit_offset set in DWARF into LF_BITFIELD symbols. gcc/ * dwarf2codeview.cc (struct codeview_custom_type): Add lf_bitfield to union. (write_lf_bitfield): New function. (write_custom_types): Call write_lf_bitfield. (create_bitfield): New function. (get_type_num_struct): Handle bitfields. * dwarf2codeview.h (LF_BITFIELD): Define. --- gcc/dwarf2codeview.cc | 89 ++- gcc/dwarf2codeview.h | 1 + 2 files changed, 88 insertions(+), 2 deletions(-) diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc index 3f1ce5577fc..06267639169 100644 --- a/gcc/dwarf2codeview.cc +++ b/gcc/dwarf2codeview.cc @@ -256,6 +256,12 @@ struct codeview_custom_type uint32_t index_type; codeview_integer length_in_bytes; } lf_array; +struct +{ + uint32_t base_type; + uint8_t length; + uint8_t position; +} lf_bitfield; }; }; @@ -1573,6 +1579,50 @@ write_lf_array (codeview_custom_type *t) asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num); } +/* Write an LF_BITFIELD type. */ + +static void +write_lf_bitfield (codeview_custom_type *t) +{ + /* This is lf_bitfield in binutils and lfBitfield in Microsoft's cvinfo.h: + +struct lf_bitfield +{ + uint16_t size; + uint16_t kind; + uint32_t base_type; + uint8_t length; + uint8_t position; +} ATTRIBUTE_PACKED; + */ + + fputs (integer_asm_op (2, false), asm_out_file); + asm_fprintf (asm_out_file, "%LLcv_type%x_end - %LLcv_type%x_start\n", + t->num, t->num); + + asm_fprintf (asm_out_file, "%LLcv_type%x_start:\n", t->num); + + fputs (integer_asm_op (2, false), asm_out_file); + fprint_whex (asm_out_file, t->kind); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (4, false), asm_out_file); + fprint_whex (asm_out_file, t->lf_bitfield.base_type); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (1, false), asm_out_file); + fprint_whex (asm_out_file, t->lf_bitfield.length); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (1, false), asm_out_file); + fprint_whex (asm_out_file, t->lf_bitfield.position); + putc ('\n', asm_out_file); + + write_cv_padding (2); + + asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num); +} + /* Write the .debug$T section, which contains all of our custom type definitions. */ @@ -1619,6 +1669,10 @@ write_custom_types (void) case LF_ARRAY: write_lf_array (custom_types); break; + + case LF_BITFIELD: + write_lf_bitfield (custom_types); + break; } free (custom_types); @@ -2199,6 +2253,33 @@ add_struct_forward_def (dw_die_ref type) return ct->num; } +/* Add an LF_BITFIELD type, returning its number. DWARF represents bitfields + as members in a struct with a DW_AT_data_bit_offset attribute, whereas in + CodeView they're a distinct type. */ + +static uint32_t +create_bitfield (dw_die_ref c) +{ + codeview_custom_type *ct; + uint32_t base_type; + + base_type = get_type_num (get_AT_ref (c, DW_AT_type), true, false); + if (base_type == 0) +return 0; + + ct = (codeview_custom_type *) xmalloc (sizeof (codeview_custom_type)); + + ct->next = NULL; + ct->kind = LF_BITFIELD; + ct->lf_bitfield.base_type = base_type; + ct->lf_bitfield.length = get_AT_unsigned (c, DW_AT_bit_size); + ct->lf_bitfield.position = get_AT_unsigned (c, DW_AT_data_bit_offset); + + add_custom_type (ct); + + return ct->num; +} + /* Process a DW_TAG_structure_type, DW_TAG_class_type, or DW_TAG_union_type DIE, add an LF_FIELDLIST and an LF_STRUCTURE / LF_CLASS / LF_UNION type, and return the number of the latter. */ @@ -2279,8 +2360,12 @@ get_type_num_struct (dw_die_ref type, bool in_struct, bool *is_fwd_ref) break; } - el->lf_member.type = get_type_num (get_AT_ref (c, DW_AT_type), true, - false); + if (get_AT (c, DW_AT_data_bit_offset)) + el->lf_member.type = create_bitfield (c); + else + el->lf_member.type = get_type_num (get_AT_ref (c, DW_AT_type), + true, false); + el->lf_member.offset.neg = false; el->lf_member.offset.num = get_AT_unsigned (c, DW_AT_data_member_location); diff --git a/gcc/dwarf2codeview.h b/gcc/dwarf2codeview.h index 70eed6bf2aa..70eae554b80 100644 --- a/gcc/dwarf2codeview.h +++ b/gcc/dwarf2codeview.h @@ -64,6 +64,7 @@ along with GCC; see the file COPYING3. If not see #define LF_MODIFIER0x1001 #define LF_POINTER 0x1002 #define LF_FIELDLIST 0x1203 +#define LF_BITFIELD0x1205 #define LF_INDEX 0x1404 #define LF_ENUMERATE 0x1502 #define LF_ARRAY 0x1503
[PATCH 03/11] Handle typedefs for CodeView
gcc/ * dwarf2codeview.cc (get_type_num): Handle typedefs. --- gcc/dwarf2codeview.cc | 6 ++ 1 file changed, 6 insertions(+) diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc index eb7c1270e31..5006a176260 100644 --- a/gcc/dwarf2codeview.cc +++ b/gcc/dwarf2codeview.cc @@ -1024,6 +1024,12 @@ get_type_num (dw_die_ref type) t->num = get_type_num_base_type (type); break; +case DW_TAG_typedef: + /* FIXME - signed longs typedef'd as "HRESULT" should get their +own type (T_HRESULT) */ + t->num = get_type_num (get_AT_ref (type, DW_AT_type)); + break; + default: t->num = 0; break; -- 2.44.2
Re: [committed] testsuite: Add -Wno-psabi to vshuf-mem.C test
On 6/14/24 20:03, Jakub Jelinek wrote: Also wonder about the // { dg-additional-options "-march=z14" { target s390*-*-* } } line, doesn't that mean the test will FAIL on all pre-z14 HW? Shouldn't it use some z14_runtime or similar effective target, or check in main (in that case copied over to g++.target/s390) whether z14 instructions can be actually used at runtime? Oh right. I'll remove that line and replicate the testcase in the arch specific test dir. Andreas
[PATCH 02/11] Handle CodeView base types
Adds a get_type_num function to translate type DIEs into CodeView numbers, along with a hash table for this. For now we just deal with the base types (integers, Unicode chars, floats, and bools). gcc/ * dwarf2codeview.cc (struct codeview_type): New structure. (struct die_hasher): Likewise. (types_htab): New variable. (codeview_debug_finish): Free types_htab if allocated. (get_type_num_base_type, get_type_num): New function. (add_variable): Call get_type_num. * dwarf2codeview.h (T_CHAR, T_SHORT, T_LONG, T_QUAD): Define. (T_UCHAR, T_USHORT, T_ULONG, T_UQUAD, T_BOOL08): Likewise. (T_REAL32, T_REAL64, T_REAL80, T_REAL128, T_RCHAR): Likewise. (T_WCHAR, T_INT4, T_UINT4, T_CHAR16, T_CHAR32, T_CHAR8): Likewise. --- gcc/dwarf2codeview.cc | 196 +- gcc/dwarf2codeview.h | 23 + 2 files changed, 218 insertions(+), 1 deletion(-) diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc index 60e84635971..eb7c1270e31 100644 --- a/gcc/dwarf2codeview.cc +++ b/gcc/dwarf2codeview.cc @@ -147,6 +147,27 @@ struct codeview_symbol }; }; +struct codeview_type +{ + dw_die_ref die; + uint32_t num; +}; + +struct die_hasher : free_ptr_hash +{ + typedef dw_die_ref compare_type; + + static hashval_t hash (const codeview_type *x) + { +return htab_hash_pointer (x->die); + } + + static bool equal (const codeview_type *x, const dw_die_ref y) + { +return x->die == y; + } +}; + static unsigned int line_label_num; static unsigned int func_label_num; static unsigned int sym_label_num; @@ -159,6 +180,7 @@ static codeview_function *funcs, *last_func; static const char* last_filename; static uint32_t last_file_id; static codeview_symbol *sym, *last_sym; +static hash_table *types_htab; /* Record new line number against the current function. */ @@ -838,6 +860,178 @@ codeview_debug_finish (void) write_source_files (); write_line_numbers (); write_codeview_symbols (); + + if (types_htab) +delete types_htab; +} + +/* Translate a DWARF base type (DW_TAG_base_type) into its CodeView + equivalent. */ + +static uint32_t +get_type_num_base_type (dw_die_ref type) +{ + unsigned int size = get_AT_unsigned (type, DW_AT_byte_size); + + switch (get_AT_unsigned (type, DW_AT_encoding)) +{ +case DW_ATE_signed_char: + { + const char *name = get_AT_string (type, DW_AT_name); + + if (size != 1) + return 0; + + if (name && !strcmp (name, "signed char")) + return T_CHAR; + else + return T_RCHAR; + } + +case DW_ATE_unsigned_char: + if (size != 1) + return 0; + + return T_UCHAR; + +case DW_ATE_signed: + switch (size) + { + case 2: + return T_SHORT; + + case 4: + { + const char *name = get_AT_string (type, DW_AT_name); + + if (name && !strcmp (name, "int")) + return T_INT4; + else + return T_LONG; + } + + case 8: + return T_QUAD; + + default: + return 0; + } + +case DW_ATE_unsigned: + switch (size) + { + case 2: + { + const char *name = get_AT_string (type, DW_AT_name); + + if (name && !strcmp (name, "wchar_t")) + return T_WCHAR; + else + return T_USHORT; + } + + case 4: + { + const char *name = get_AT_string (type, DW_AT_name); + + if (name && !strcmp (name, "unsigned int")) + return T_UINT4; + else + return T_ULONG; + } + + case 8: + return T_UQUAD; + + default: + return 0; + } + +case DW_ATE_UTF: + switch (size) + { + case 1: + return T_CHAR8; + + case 2: + return T_CHAR16; + + case 4: + return T_CHAR32; + + default: + return 0; + } + +case DW_ATE_float: + switch (size) + { + case 4: + return T_REAL32; + + case 8: + return T_REAL64; + + case 12: + return T_REAL80; + + case 16: + return T_REAL128; + + default: + return 0; + } + +case DW_ATE_boolean: + if (size == 1) + return T_BOOL08; + else + return 0; + +default: + return 0; +} +} + +/* Process a DIE representing a type definition and return its number. If + it's something we can't handle, return 0. We keep a hash table so that + we're not adding the same type multiple times - though if we do it's not + disastrous, as ld will deduplicate everything for us. */ + +static uint32_t +get_type_num (dw_die_ref type) +{ + codeview_type **slot, *t; + + if (!type) +return 0; + + if (!types_htab) +types_htab = new hash_table (10); + + slot = types_htab->find_slot
Re: [PATCH] rs6000: ROP - Do not disable shrink-wrapping for leaf functions [PR114759]
On 6/17/24 6:11 PM, Segher Boessenkool wrote: > "ROP insns" are the instructions used in such exploits, not what you > mean here :-) > > The instructions are called "hash*"C, so maybe call tbem "hash insns" > or "ROP protect hash insns"?. Ok, that bad verbiage was in the extra commentary not part of the git log entry. That said, I'll reword that to the following: Only disable shrink-wrapping when using -mrop-protect when we know we -will be emitting the ROP instructions (ie, non-leaf functions). +will be emitting the ROP protect hash instructions (ie, non-leaf functions). >> * config/rs6000/rs6000.cc (rs6000_override_options_after_change): Move >> the disabling of shrink-wrapping from here >> * config/rs6000/rs6000-logue.cc (rs6000_stack_info): ...to here. > > Hrm. Can you do it in some particular caller of rs6000_stack_info, > instead? The rs6000_stack_info function itself is not suppposed to > change any state whatsoever. Sure, I can look at maybe moving that to the caller or maybe somewhere better. I'll repost the patch once I find a better location. > The comment should say *why*! The fact that we do is clear from the > code itself already. But why do we want this? > >> --- a/gcc/config/rs6000/rs6000.cc >> +++ b/gcc/config/rs6000/rs6000.cc >> @@ -3427,10 +3427,6 @@ rs6000_override_options_after_change (void) >> } >>else if (!OPTION_SET_P (flag_cunroll_grow_size)) >> flag_cunroll_grow_size = flag_peel_loops || optimize >= 3; >> - >> - /* If we are inserting ROP-protect instructions, disable shrink wrap. */ >> - if (rs6000_rop_protect) >> -flag_shrink_wrap = 0; >> } > > (Yes, I know the original code didn't say either, but let's try to make > things better :-) ) Yeah, I didn't write that, I only moved it, but I can try to come up with an explanation of why we need to disable it now. That said, my hope is to not have to disable shrink-wrapping even when we emit the ROP protect hash insns in the future, but that will take some extra work. If I can manage that, then this should all just go away. :-) Until then, we can stick with this patch's micro-optimization. >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/pr114759-1.c >> @@ -0,0 +1,16 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -mdejagnu-cpu=power10 -mrop-protect >> -fdump-rtl-pro_and_epilogue" } */ >> +/* { dg-require-effective-target rop_ok } */ > > Do you want rop_ok while you are *forcing* it to be okay anyway? Why? At the moment, yes, since the rop_ok test not only checks for the -mcpu= level, it also verifies that the ABI is ok. Currently, rop_ok makes sure we have Power10 and ELFv2 ABI being used. So currently, if we were to run this test on BE, we'd get an UNSUPPORTED using the rop_ok check, but if we removed it, we'd see a FAIL. As we discussed offline, the plan is to eventually enable emitting the ROP protect hash insns on other ABIs, but until then, I think we want to keep the rop_ok check so as to keep Bill's CI builder from flagging it as a FAIL. Peter
[PATCH 06/11] Handle enums for CodeView
Translates DW_TAG_enumeration_type DIEs into LF_ENUM symbols. gcc/ * dwarf2codeview.cc (MAX_FIELDLIST_SIZE): Define. (struct codeview_integer): New structure. (struct codeview_subtype): Likewise (struct codeview_custom_type): Add lf_fieldlist and lf_enum to union. (write_cv_integer, cv_integer_len): New functions. (write_lf_fieldlist, write_lf_enum): Likewise. (write_custom_types): Call write_lf_fieldlist and write_lf_enum. (add_enum_forward_def): New function. (get_type_num_enumeration_type): Likewise. (get_type_num): Handle DW_TAG_enumeration_type DIEs. * dwarf2codeview.h (LF_FIELDLIST, LF_INDEX, LF_ENUMERATE): Define. (LF_ENUM, LF_CHAR, LF_SHORT, LF_USHORT, LF_LONG): Likewise. (LF_ULONG, LF_QUADWORD, LF_UQUADWORD): Likewise. (CV_ACCESS_PRIVATE, CV_ACCESS_PROTECTED): Likewise. (CV_ACCESS_PUBLIC, CV_PROP_FWDREF): Likewise. --- gcc/dwarf2codeview.cc | 524 ++ gcc/dwarf2codeview.h | 17 ++ 2 files changed, 541 insertions(+) diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc index 05f5f60997e..475a53573e9 100644 --- a/gcc/dwarf2codeview.cc +++ b/gcc/dwarf2codeview.cc @@ -63,6 +63,11 @@ along with GCC; see the file COPYING3. If not see #define SYMBOL_START_LABEL "Lcvsymstart" #define SYMBOL_END_LABEL "Lcvsymend" +/* There's two bytes available for each type's size, but follow MSVC's lead in + capping the LF_FIELDLIST size at fb00 (minus 8 bytes for the LF_INDEX + pointing to the overflow entry). */ +#define MAX_FIELDLIST_SIZE 0xfaf8 + #define HASH_SIZE 16 struct codeview_string @@ -170,6 +175,31 @@ struct die_hasher : free_ptr_hash } }; +struct codeview_integer +{ + bool neg; + uint64_t num; +}; + +struct codeview_subtype +{ + struct codeview_subtype *next; + uint16_t kind; + + union + { +struct +{ + char *name; + struct codeview_integer value; +} lf_enumerate; +struct +{ + uint32_t type_num; +} lf_index; + }; +}; + struct codeview_custom_type { struct codeview_custom_type *next; @@ -188,6 +218,20 @@ struct codeview_custom_type uint32_t base_type; uint16_t modifier; } lf_modifier; +struct +{ + size_t length; + codeview_subtype *subtypes; + codeview_subtype *last_subtype; +} lf_fieldlist; +struct +{ + uint16_t count; + uint16_t properties; + uint32_t underlying_type; + uint32_t fieldlist; + char *name; +} lf_enum; }; }; @@ -978,6 +1022,292 @@ write_lf_modifier (codeview_custom_type *t) asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num); } +/* Write a CodeView extensible integer. If the value is non-negative and + < 0x8000, the value gets written directly as an uint16_t. Otherwise, we + output two bytes for the integer type (LF_CHAR, LF_SHORT, ...), and the + actual value follows. */ + +static size_t +write_cv_integer (codeview_integer *i) +{ + if (i->neg) +{ + if (i->num <= 0x80) + { + fputs (integer_asm_op (2, false), asm_out_file); + fprint_whex (asm_out_file, LF_CHAR); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (1, false), asm_out_file); + fprint_whex (asm_out_file, -i->num); + putc ('\n', asm_out_file); + + return 3; + } + else if (i->num <= 0x8000) + { + fputs (integer_asm_op (2, false), asm_out_file); + fprint_whex (asm_out_file, LF_SHORT); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (2, false), asm_out_file); + fprint_whex (asm_out_file, -i->num); + putc ('\n', asm_out_file); + + return 4; + } + else if (i->num <= 0x8000) + { + fputs (integer_asm_op (2, false), asm_out_file); + fprint_whex (asm_out_file, LF_LONG); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (4, false), asm_out_file); + fprint_whex (asm_out_file, -i->num); + putc ('\n', asm_out_file); + + return 6; + } + else + { + fputs (integer_asm_op (2, false), asm_out_file); + fprint_whex (asm_out_file, LF_QUADWORD); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (8, false), asm_out_file); + fprint_whex (asm_out_file, -i->num); + putc ('\n', asm_out_file); + + return 10; + } +} + else +{ + if (i->num <= 0x7fff) + { + fputs (integer_asm_op (2, false), asm_out_file); + fprint_whex (asm_out_file, i->num); + putc ('\n', asm_out_file); + + return 2; + } + else if (i->num <= 0x) + { + fputs (integer_asm_op (2, false), asm_out_file); + fprint_whex (asm_out_file, LF_USHORT); + putc ('
[PATCH 08/11] Handle unions for CodeView.
Translates DW_TAG_union_type DIEs into LF_UNION symbols. gcc/ * dwarf2codeview.cc (write_lf_union): New function. (write_custom_types): Call write_lf_union. (add_struct_forward_def): Handle DW_TAG_union_type DIEs. (get_type_num_struct): Handle unions. (get_type_num): Handle DW_TAG_union_type DIEs. * dwarf2codeview.h (LF_UNION): Define. --- gcc/dwarf2codeview.cc | 91 --- gcc/dwarf2codeview.h | 1 + 2 files changed, 86 insertions(+), 6 deletions(-) diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc index 9c6614f6297..9e3b64522b2 100644 --- a/gcc/dwarf2codeview.cc +++ b/gcc/dwarf2codeview.cc @@ -1454,6 +1454,72 @@ write_lf_structure (codeview_custom_type *t) asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num); } +/* Write an LF_UNION type. */ + +static void +write_lf_union (codeview_custom_type *t) +{ + size_t name_len, leaf_len; + + /* This is lf_union in binutils and lfUnion in Microsoft's cvinfo.h: + +struct lf_union +{ + uint16_t size; + uint16_t kind; + uint16_t num_members; + uint16_t properties; + uint32_t field_list; + uint16_t length; + char name[]; +} ATTRIBUTE_PACKED; + */ + + fputs (integer_asm_op (2, false), asm_out_file); + asm_fprintf (asm_out_file, "%LLcv_type%x_end - %LLcv_type%x_start\n", + t->num, t->num); + + asm_fprintf (asm_out_file, "%LLcv_type%x_start:\n", t->num); + + fputs (integer_asm_op (2, false), asm_out_file); + fprint_whex (asm_out_file, t->kind); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (2, false), asm_out_file); + fprint_whex (asm_out_file, t->lf_structure.num_members); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (2, false), asm_out_file); + fprint_whex (asm_out_file, t->lf_structure.properties); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (4, false), asm_out_file); + fprint_whex (asm_out_file, t->lf_structure.field_list); + putc ('\n', asm_out_file); + + leaf_len = 12 + write_cv_integer (&t->lf_structure.length); + + if (t->lf_structure.name) +{ + name_len = strlen (t->lf_structure.name) + 1; + ASM_OUTPUT_ASCII (asm_out_file, t->lf_structure.name, name_len); +} + else +{ + static const char unnamed_struct[] = ""; + + name_len = sizeof (unnamed_struct); + ASM_OUTPUT_ASCII (asm_out_file, unnamed_struct, name_len); +} + + leaf_len += name_len; + write_cv_padding (4 - (leaf_len % 4)); + + free (t->lf_structure.name); + + asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num); +} + /* Write the .debug$T section, which contains all of our custom type definitions. */ @@ -1492,6 +1558,10 @@ write_custom_types (void) case LF_CLASS: write_lf_structure (custom_types); break; + + case LF_UNION: + write_lf_union (custom_types); + break; } free (custom_types); @@ -2026,7 +2096,7 @@ flush_deferred_types (void) last_deferred_type = NULL; } -/* Add a forward definition for a struct or class. */ +/* Add a forward definition for a struct, class, or union. */ static uint32_t add_struct_forward_def (dw_die_ref type) @@ -2047,6 +2117,10 @@ add_struct_forward_def (dw_die_ref type) ct->kind = LF_STRUCTURE; break; +case DW_TAG_union_type: + ct->kind = LF_UNION; + break; + default: break; } @@ -2068,9 +2142,9 @@ add_struct_forward_def (dw_die_ref type) return ct->num; } -/* Process a DW_TAG_structure_type or DW_TAG_class_type DIE, add an - LF_FIELDLIST and an LF_STRUCTURE / LF_CLASS type, and return the number of - the latter. */ +/* Process a DW_TAG_structure_type, DW_TAG_class_type, or DW_TAG_union_type + DIE, add an LF_FIELDLIST and an LF_STRUCTURE / LF_CLASS / LF_UNION type, + and return the number of the latter. */ static uint32_t get_type_num_struct (dw_die_ref type, bool in_struct, bool *is_fwd_ref) @@ -2227,8 +2301,8 @@ get_type_num_struct (dw_die_ref type, bool in_struct, bool *is_fwd_ref) ct = ct2; } - /* Now add an LF_STRUCTURE / LF_CLASS, pointing to the LF_FIELDLIST we just - added. */ + /* Now add an LF_STRUCTURE / LF_CLASS / LF_UNION, pointing to the + LF_FIELDLIST we just added. */ ct = (codeview_custom_type *) xmalloc (sizeof (codeview_custom_type)); @@ -2244,6 +2318,10 @@ get_type_num_struct (dw_die_ref type, bool in_struct, bool *is_fwd_ref) ct->kind = LF_STRUCTURE; break; +case DW_TAG_union_type: + ct->kind = LF_UNION; + break; + default: break; } @@ -2325,6 +2403,7 @@ get_type_num (dw_die_ref type, bool in_struct, bool no_fwd_ref) case DW_TAG_structure_type: case DW_TAG_class_type: +case DW_TAG_union_type: num = get_type_num_struct (type, in_struct, &is_fwd_ref); break; diff --git a/gcc/dwarf2codeview.h b/gcc/dwar
[PATCH 09/11] Handle arrays for CodeView
Translates DW_TAG_array_type DIEs into LF_ARRAY symbols. gcc/ * dwarf2codeview.cc (struct codeview_custom_type): Add lf_array to union. (write_lf_array): New function. (write_custom_types): Call write_lf_array. (get_type_num_array_type): New function. (get_type_num): Handle DW_TAG_array_type DIEs. * dwarf2codeview.h (LF_ARRAY): Define. --- gcc/dwarf2codeview.cc | 179 ++ gcc/dwarf2codeview.h | 1 + 2 files changed, 180 insertions(+) diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc index 9e3b64522b2..3f1ce5577fc 100644 --- a/gcc/dwarf2codeview.cc +++ b/gcc/dwarf2codeview.cc @@ -250,6 +250,12 @@ struct codeview_custom_type codeview_integer length; char *name; } lf_structure; +struct +{ + uint32_t element_type; + uint32_t index_type; + codeview_integer length_in_bytes; +} lf_array; }; }; @@ -1520,6 +1526,53 @@ write_lf_union (codeview_custom_type *t) asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num); } +/* Write an LF_ARRAY type. */ + +static void +write_lf_array (codeview_custom_type *t) +{ + size_t leaf_len; + + /* This is lf_array in binutils and lfArray in Microsoft's cvinfo.h: + +struct lf_array +{ + uint16_t size; + uint16_t kind; + uint32_t element_type; + uint32_t index_type; + uint16_t length_in_bytes; + char name[]; +} ATTRIBUTE_PACKED; + */ + + fputs (integer_asm_op (2, false), asm_out_file); + asm_fprintf (asm_out_file, "%LLcv_type%x_end - %LLcv_type%x_start\n", + t->num, t->num); + + asm_fprintf (asm_out_file, "%LLcv_type%x_start:\n", t->num); + + fputs (integer_asm_op (2, false), asm_out_file); + fprint_whex (asm_out_file, t->kind); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (4, false), asm_out_file); + fprint_whex (asm_out_file, t->lf_array.element_type); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (4, false), asm_out_file); + fprint_whex (asm_out_file, t->lf_array.index_type); + putc ('\n', asm_out_file); + + leaf_len = 13 + write_cv_integer (&t->lf_array.length_in_bytes); + + ASM_OUTPUT_ASCII (asm_out_file, "", 1); + + write_cv_padding (4 - (leaf_len % 4)); + + asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num); +} + /* Write the .debug$T section, which contains all of our custom type definitions. */ @@ -1562,6 +1615,10 @@ write_custom_types (void) case LF_UNION: write_lf_union (custom_types); break; + + case LF_ARRAY: + write_lf_array (custom_types); + break; } free (custom_types); @@ -2346,6 +2403,124 @@ get_type_num_struct (dw_die_ref type, bool in_struct, bool *is_fwd_ref) return ct->num; } +/* Process a DW_TAG_array_type DIE, adding an LF_ARRAY type and returning its + number. */ + +static uint32_t +get_type_num_array_type (dw_die_ref type, bool in_struct) +{ + dw_die_ref base_type, t, first_child, c, *dimension_arr; + uint64_t size = 0; + unsigned int dimensions, i; + uint32_t element_type; + + base_type = get_AT_ref (type, DW_AT_type); + if (!base_type) +return 0; + + /* We need to know the size of our base type. Loop through until we find + it. */ + t = base_type; + while (t && size == 0) +{ + switch (dw_get_die_tag (t)) + { + case DW_TAG_const_type: + case DW_TAG_volatile_type: + case DW_TAG_typedef: + case DW_TAG_enumeration_type: + t = get_AT_ref (t, DW_AT_type); + break; + + case DW_TAG_base_type: + case DW_TAG_structure_type: + case DW_TAG_class_type: + case DW_TAG_union_type: + case DW_TAG_pointer_type: + size = get_AT_unsigned (t, DW_AT_byte_size); + break; + + default: + return 0; + } +} + + if (size == 0) +return 0; + + first_child = dw_get_die_child (type); + if (!first_child) +return 0; + + element_type = get_type_num (base_type, in_struct, false); + if (element_type == 0) +return 0; + + /* Create an array of our DW_TAG_subrange_type children, in reverse order. + We have to do this because unlike DWARF CodeView doesn't have + multidimensional arrays, so instead we do arrays of arrays. */ + + dimensions = 0; + c = first_child; + do +{ + c = dw_get_die_sib (c); + if (dw_get_die_tag (c) != DW_TAG_subrange_type) + continue; + + dimensions++; +} + while (c != first_child); + + if (dimensions == 0) +return 0; + + dimension_arr = (dw_die_ref *) xmalloc (sizeof (dw_die_ref) * dimensions); + + c = first_child; + i = 0; + do +{ + c = dw_get_die_sib (c); + if (dw_get_die_tag (c) != DW_TAG_subrange_type) + continue; + + dimension_arr[dimensions - i - 1] = c; + i++; +} + while (c != first_child); + + /* Record an LF_ARRAY entry for each array dimensi