Fix update_bb_profile_for_threading
Hi, this patch fixes some of profile mismatches caused by profile updating. It seems that I misupdated update_bb_profile_for_threading in 2017 which results in invalid updates from rtl threading and threadbackwards. update_bb_profile_for_threading knows that some paths to BB are being redirected elsehwere and those paths will exit from BB with E. So it needs to determine probability of the duplicated path and redistribute probablities. For some reaosn however the conditonal probability of redirected path is computed after its counts is subtracted which is wrong and often results in probability greater than 100%. I also fixed error mesage. Compilling tramp3d I now get following passes producing mismpatches: Pass dump id and name|static mismatcdynamic mismatch |in count |in count 113t fre | 2+2|0 114t mergephi| 2 |0 115t threadfull | 2 |0 116t vrp | 2 |0 127t ch |307 +305|347194302 +347194302 130t thread |313+6|347221478 +27176 131t dom |321+8|346841121 -380357 134t reassoc |323+2|346841121 136t forwprop|327+4|347026371 +185250 144t pre |326-1|347040926 +14555 172t ifcvt |338+2|347218249 +156280 173t vect|409 +71|356357418 +9139169 176t cunroll |377 -32|126071925 -230285493 183t loopdone|376-1|126015489 -56436 194t tracer |379+3|127258199 +1242710 197t dom |375-4|128352165 +1093966 199t threadfull |379+4|128526112 +173947 200t vrp |381+2|128724673 +198561 204t dce |374-7|128632495 -92178 206t sink|370-4|128618043 -14452 211t cddce |372+2|128632495 +14452 248t ehcleanup |370-2|128618755 -13740 255t optimized |362-8|128576810 -41945 256r expand |356-6|128899768 +322958 258r into_cfglayout |353-3|129051765 +151997 259r jump|354+1|129051765 262r cse1|353-1|129051765 275r loop2_unroll|355+2|132182110 +3130345 277r loop2_done |354-1|132182109 -1 312r pro_and_epilogue|371 +17|13324 +40215 323r bbro|375+4|132095926 -126398 Without the patch at jump2 time we get over 432 mismatches, so 15% improvement. Some of the mismathces are unavoidable. I think ch mismatches are mostly due to loop header copying where the header condition constant propagates. Most common case should be threadable in early optimizations and we also could do better on profile updating here. Bootstrapped/regtested x6_64-linux, comitted. gcc/ChangeLog: PR tree-optimization/103680 * cfg.cc (update_bb_profile_for_threading): Fix profile update; make message clearer. gcc/testsuite/ChangeLog: PR tree-optimization/103680 * gcc.dg/tree-ssa/pr103680.c: New test. * gcc.dg/tree-prof/cmpsf-1.c: Un-xfail. # Please enter the commit message for your changes. Lines starting # with '#' will be ignored, and an empty message aborts the commit. # # On branch master # Your branch is up to date with 'origin/master'. # # Changes to be committed: # modified: cfg.cc # modified: testsuite/gcc.dg/tree-prof/cmpsf-1.c # new file: testsuite/gcc.dg/tree-ssa/pr103680.c # # Changes not staged for commit: # modified: internal-fn.def # modified: ../libstdc++-v3/include/bits/c++config # modified: ../libstdc++-v3/include/bits/new_allocator.h # modified: ../libstdc++-v3/include/ext/malloc_allocator.h # modified: ../libstdc++-v3/include/ext/random.tcc # # Untracked files: # ../1 # ../alwaysexec # ../b/ # ../buil3/ # ../build-in/ # ../build-inst/ # ../build-inst2/ # ../build-kub/ # ../build-lto/ # ../build-lto2/ # ../build-lto3/ # ../build-ppc/ # ../build-profiled/ # ../build/ # ../build2/ # ../build3/ # ../changes # .cfgloopmanip.cc.swo
Re: [PATCH] RISC-V: improve codegen for repeating large constants [3]
On Fri, Jun 30, 2023 at 5:36 PM Palmer Dabbelt wrote: > > On Fri, 30 Jun 2023 17:25:54 PDT (-0700), Andrew Waterman wrote: > > On Fri, Jun 30, 2023 at 5:13 PM Vineet Gupta wrote: > >> > >> > >> > >> On 6/30/23 16:50, Andrew Waterman wrote: > >> > I don't believe this is correct; the subtraction is needed to account > >> > for the fact that the low part might be negative, resulting in a > >> > borrow from the high part. See the output for your test case below: > >> > > >> > $ cat test.c > >> > #include > >> > > >> > int main() > >> > { > >> >unsigned long result, tmp; > >> > > >> > asm ( > >> >"li %1,-252645376\n" > >> >"addi%1,%1,240\n" > >> >"slli%0,%1,32\n" > >> >"add %0,%0,%1" > >> > : "=r" (result), "=r" (tmp)); > >> > > >> >printf("%lx\n", result); > >> > > >> >return 0; > >> > } > >> > $ riscv64-unknown-elf-gcc -O2 test.c > >> > $ spike pk a.out > >> > bbl loader > >> > f0f0f0eff0f0f0f0 > >> > $ > >> > >> Thx for the quick feedback Andew. I'm clearly lacking in signed math :-( > >> So is it possible to have a better code seq for the testcase at all ? > > > > You're welcome! > > > > When Zba is implemented, then inserting a zext.w would do the trick; > > see below. (The generalization is that the zext.w is needed if the > > 32-bit constant is negative.) When Zba is not implemented, I think > > the original sequence is optimal. > > > > li a5, -252645376 > > addia5, a5, 240 > > sllia0, a5, 32 > > zext.w a5, a5 > > add a0, a0, a5 > > For the non-Zba case, I think we can leverage the two high parts > starting out the same to save an instruction generating the constant. > So for the original code sequence of > > li a5,-252645376 > addia5,a5,241 > li a0,-252645376 > sllia5,a5,32 > addia0,a0,240 > add a0,a5,a0 > ret > > we could instead generate > > li a5,-252645376 > addia0,a5,240 > addia5,a5,241 > sllia5,a5,32 > add a0,a5,a0 > ret > > which IIUC produces the same result. I think something along the lines > of this (with the corresponding cost function updates) would do it > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > index de578b5b899..32b6033a966 100644 > --- a/gcc/config/riscv/riscv.cc > +++ b/gcc/config/riscv/riscv.cc > @@ -704,7 +704,13 @@ riscv_split_integer (HOST_WIDE_INT val, machine_mode > mode) >rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode); > >riscv_move_integer (hi, hi, hival, mode); > - riscv_move_integer (lo, lo, loval, mode); > + if (riscv_integer_cost (loval - hival) + 1 < riscv_integer_cost > (loval)) { > +rtx delta = gen_reg_rrtx (mode); > +riscv_move_integer (delta, delta, loval - hival, mode); > +lo = gen_rtx_fmt_ee (PLUS, mode, hi, delta); > + } else { > +riscv_move_integer (lo, lo, loval, mode); > + } > >hi = gen_rtx_fmt_ee (ASHIFT, mode, hi, GEN_INT (32)); >hi = force_reg (mode, hi); > > though I suppose that would produce a slightly different sequence that has the > same number of instructions but a slightly longer dependency chain, something > more like > > li a5,-252645376 > addia5,a5,241 > addia0,a5,-1 > sllia5,a5,32 > add a0,a5,a0 > ret > > Take that all with a grain of salt, though, as I just ate some very spicy > chicken and can barely see straight :) Yeah, that might end up being a false economy for superscalars. In general, I wouldn't recommend spending too many cleverness beans on non-Zba+Zbb implementations. Going forward, we should expect that even very simple cores provide those extensions. > > > > > > > >> > >> -Vineet > >> > >> > > >> > > >> > On Fri, Jun 30, 2023 at 4:42 PM Vineet Gupta > >> > wrote: > >> >> > >> >> > >> >> On 6/30/23 16:33, Vineet Gupta wrote: > >> >>> Ran into a minor snafu in const splitting code when playing with test > >> >>> case from an old PR/23813. > >> >>> > >> >>>long long f(void) { return 0xF0F0F0F0F0F0F0F0ull; } > >> >>> > >> >>> This currently generates > >> >>> > >> >>>li a5,-252645376 > >> >>>addia5,a5,241 > >> >>>li a0,-252645376 > >> >>>sllia5,a5,32 > >> >>>addia0,a0,240 > >> >>>add a0,a5,a0 > >> >>>ret > >> >>> > >> >>> The signed math in hival extraction introduces an additional bit, > >> >>> causing loval == hival check to fail. > >> >>> > >> >>> | riscv_split_integer (val=-1085102592571150096, mode=E_DImode) at > >> >>> ../gcc/config/riscv/riscv.cc:702 > >> >>> | 702 unsigned HOST_WIDE_INT loval = sext_hwi (val, 32); > >> >>> | (gdb)n > >> >>> | 703 unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, > >> >>> 32); > >> >>> | (gdb) > >> >> FWIW (and I missed adding this observation to the change
[PATCH 1/2] Fix PR 110487: invalid signed boolean value
This fixes the first part of this bug where `a ? -1 : 0` would cause a value of 1 into the signed boolean value. It fixes the problem by casting to an integer type of the same size/signedness before doing the negative and then casting to the type of expression. OK? Bootstrapped and tested on x86_64. gcc/ChangeLog: * match.pd (a?-1:0): Cast type an integer type rather the type before the negative. (a?0:-1): Likewise. --- gcc/match.pd | 22 -- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/gcc/match.pd b/gcc/match.pd index 45c72e733a5..a0d114f6a16 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -4703,7 +4703,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* a ? -1 : 0 -> -a. No need to check the TYPE_PRECISION not being 1 here as the powerof2cst case above will handle that case correctly. */ (if (INTEGRAL_TYPE_P (type) && integer_all_onesp (@1)) - (negate (convert (convert:boolean_type_node @0)) + (with { + auto prec = TYPE_PRECISION (type); + auto unsign = TYPE_UNSIGNED (type); + tree inttype = build_nonstandard_integer_type (prec, unsign); + } + (convert (negate (convert:inttype (convert:boolean_type_node @0 (if (integer_zerop (@1)) (with { tree booltrue = constant_boolean_node (true, boolean_type_node); @@ -4722,7 +4727,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* a ? -1 : 0 -> -(!a). No need to check the TYPE_PRECISION not being 1 here as the powerof2cst case above will handle that case correctly. */ (if (INTEGRAL_TYPE_P (type) && integer_all_onesp (@2)) - (negate (convert (bit_xor (convert:boolean_type_node @0) { booltrue; } + (with { + auto prec = TYPE_PRECISION (type); + auto unsign = TYPE_UNSIGNED (type); + tree inttype = build_nonstandard_integer_type (prec, unsign); + } + (convert + (negate + (convert:inttype + (bit_xor (convert:boolean_type_node @0) { booltrue; } ) +) + ) + ) + ) + ) ) ) ) -- 2.31.1
[PATCH 2/2] PR 110487: `(a !=/== CST1 ? CST2 : CST3)` pattern for type safety
The problem here is we might produce some values out of the type's min/max (and/or valid values, e.g. signed booleans). The fix is to use an integer type which has the same precision and signedness as the original type. Note two_value_replacement in phiopt had the same issue in previous versions; though I don't know if a problem will show up there. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: PR tree-optimization/110487 * match.pd (a !=/== CST1 ? CST2 : CST3): Always build a nonstandard integer and use that. --- gcc/match.pd | 24 1 file changed, 8 insertions(+), 16 deletions(-) diff --git a/gcc/match.pd b/gcc/match.pd index a0d114f6a16..9748ad8466e 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -4797,24 +4797,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) tree type1; if ((eqne == EQ_EXPR) ^ (wi::to_wide (@1) == min)) std::swap (arg0, arg1); - if (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (type)) -{ - /* Avoid performing the arithmetics in bool type which has different - semantics, otherwise prefer unsigned types from the two with -the same precision. */ - if (TREE_CODE (TREE_TYPE (arg0)) == BOOLEAN_TYPE - || !TYPE_UNSIGNED (type)) -type1 = TREE_TYPE (@0); - else -type1 = TREE_TYPE (arg0); -} - else if (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (type)) + if (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (type)) type1 = TREE_TYPE (@0); else type1 = type; - min = wide_int::from (min, TYPE_PRECISION (type1), + auto prec = TYPE_PRECISION (type1); + auto unsign = TYPE_UNSIGNED (type1); + type1 = build_nonstandard_integer_type (prec, unsign); + min = wide_int::from (min, prec, TYPE_SIGN (TREE_TYPE (@0))); - wide_int a = wide_int::from (wi::to_wide (arg0), TYPE_PRECISION (type1), + wide_int a = wide_int::from (wi::to_wide (arg0), prec, TYPE_SIGN (type)); enum tree_code code; wi::overflow_type ovf; @@ -4822,7 +4814,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) { code = PLUS_EXPR; a -= min; - if (!TYPE_UNSIGNED (type1)) + if (!unsign) { /* lhs is known to be in range [min, min+1] and we want to add a to it. Check if that operation can overflow for those 2 values @@ -4836,7 +4828,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) { code = MINUS_EXPR; a += min; - if (!TYPE_UNSIGNED (type1)) + if (!unsign) { /* lhs is known to be in range [min, min+1] and we want to subtract it from a. Check if that operation can overflow for those 2 -- 2.31.1
[PATCH 0/2] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets
noce_convert_multiple_sets has been introduced and extended over time to handle if conversion for blocks with multiple sets. Currently this is focused on register moves and rejects any sort of arithmetic operations. This series is an extension to allow more sequences to take part in if conversion. The first patch is a required change to emit correct code and the second patch whitelists a larger number of operations through bb_ok_for_noce_convert_multiple_sets. For targets that have a rich selection of conditional instructions, like aarch64, I have seen an ~5x increase of profitable if conversions for multiple set blocks in SPEC benchmarks. Also tested with a wide variety of benchmarks and I have not seen performance regressions on either x64 / aarch64. Some samples that previously resulted in a branch but now better use these instructions can be seen in the provided test case. Tested on aarch64 and x64; On x64 some tests that use __builtin_rint are failing with an ICE but I believe that it's not an issue of this change. force_operand crashes when (and:DF (not:DF (reg:DF 88)) (reg/v:DF 83 [ x ])) is provided through emit_conditional_move. Manolis Tsamis (2): ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets ifcvt: Allow more operations in multiple set if conversion gcc/ifcvt.cc | 109 ++ .../aarch64/ifcvt_multiple_sets_arithm.c | 67 +++ 2 files changed, 127 insertions(+), 49 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c -- 2.34.1
[PATCH 1/2] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets
This is an extension of what was done in PR106590. Currently if a sequence generated in noce_convert_multiple_sets clobbers the condition rtx (cc_cmp or rev_cc_cmp) then only seq1 is used afterwards (sequences that emit the comparison itself). Since this applies only from the next iteration it assumes that the sequences generated (in particular seq2) doesn't clobber the condition rtx itself before using it in the if_then_else, which is only true in specific cases (currently only register/subregister moves are allowed). This patch changes this so it also tests if seq2 clobbers cc_cmp/rev_cc_cmp in the current iteration. This makes it possible to include arithmetic operations in noce_convert_multiple_sets. gcc/ChangeLog: * ifcvt.cc (check_for_cc_cmp_clobbers): Use modified_in_p instead. (noce_convert_multiple_sets_1): Don't use seq2 if it clobbers cc_cmp. Signed-off-by: Manolis Tsamis --- gcc/ifcvt.cc | 49 +++-- 1 file changed, 19 insertions(+), 30 deletions(-) diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc index 0b180b4568f..fd1ce8a1049 100644 --- a/gcc/ifcvt.cc +++ b/gcc/ifcvt.cc @@ -3373,20 +3373,6 @@ noce_convert_multiple_sets (struct noce_if_info *if_info) return TRUE; } -/* Helper function for noce_convert_multiple_sets_1. If store to - DEST can affect P[0] or P[1], clear P[0]. Called via note_stores. */ - -static void -check_for_cc_cmp_clobbers (rtx dest, const_rtx, void *p0) -{ - rtx *p = (rtx *) p0; - if (p[0] == NULL_RTX) -return; - if (reg_overlap_mentioned_p (dest, p[0]) - || (p[1] && reg_overlap_mentioned_p (dest, p[1]))) -p[0] = NULL_RTX; -} - /* This goes through all relevant insns of IF_INFO->then_bb and tries to create conditional moves. In case a simple move sufficis the insn should be listed in NEED_NO_CMOV. The rewired-src cases should be @@ -3550,9 +3536,17 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info, creating an additional compare for each. If successful, costing is easier and this sequence is usually preferred. */ if (cc_cmp) - seq2 = try_emit_cmove_seq (if_info, temp, cond, - new_val, old_val, need_cmov, - &cost2, &temp_dest2, cc_cmp, rev_cc_cmp); + { + seq2 = try_emit_cmove_seq (if_info, temp, cond, +new_val, old_val, need_cmov, +&cost2, &temp_dest2, cc_cmp, rev_cc_cmp); + + /* The if_then_else in SEQ2 may be affected when cc_cmp/rev_cc_cmp is +clobbered. We can't safely use the sequence in this case. */ + if (seq2 && (modified_in_p (cc_cmp, seq2) + || (rev_cc_cmp && modified_in_p (rev_cc_cmp, seq2 + seq2 = NULL; + } /* The backend might have created a sequence that uses the condition. Check this. */ @@ -3607,21 +3601,16 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info, return FALSE; } - if (cc_cmp) + if (cc_cmp && seq == seq1) { - /* Check if SEQ can clobber registers mentioned in -cc_cmp and/or rev_cc_cmp. If yes, we need to use -only seq1 from that point on. */ - rtx cc_cmp_pair[2] = { cc_cmp, rev_cc_cmp }; - for (walk = seq; walk; walk = NEXT_INSN (walk)) + /* Check if SEQ can clobber registers mentioned in cc_cmp/rev_cc_cmp. +If yes, we need to use only seq1 from that point on. +Only check when we use seq1 since we have already tested seq2. */ + if (modified_in_p (cc_cmp, seq) + || (rev_cc_cmp && modified_in_p (rev_cc_cmp, seq))) { - note_stores (walk, check_for_cc_cmp_clobbers, cc_cmp_pair); - if (cc_cmp_pair[0] == NULL_RTX) - { - cc_cmp = NULL_RTX; - rev_cc_cmp = NULL_RTX; - break; - } + cc_cmp = NULL_RTX; + rev_cc_cmp = NULL_RTX; } } -- 2.34.1
[PATCH 2/2] ifcvt: Allow more operations in multiple set if conversion
Currently the operations allowed for if conversion of a basic block with multiple sets are few, namely REG, SUBREG and CONST_INT (as controlled by bb_ok_for_noce_convert_multiple_sets). This commit allows more operations (arithmetic, compare, etc) to participate in if conversion. The target's profitability hook and ifcvt's costing is expected to reject sequences that are unprofitable. This is especially useful for targets which provide a rich selection of conditional instructions (like aarch64 which has cinc, csneg, csinv, ccmp, ...) which are currently not used in basic blocks with more than a single set. gcc/ChangeLog: * ifcvt.cc (try_emit_cmove_seq): Modify comments. (noce_convert_multiple_sets_1): Modify comments. (bb_ok_for_noce_convert_multiple_sets): Allow more operations. gcc/testsuite/ChangeLog: * gcc.target/aarch64/ifcvt_multiple_sets_arithm.c: New test. Signed-off-by: Manolis Tsamis --- gcc/ifcvt.cc | 60 +++-- .../aarch64/ifcvt_multiple_sets_arithm.c | 67 +++ 2 files changed, 108 insertions(+), 19 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc index fd1ce8a1049..a9e5352a0a0 100644 --- a/gcc/ifcvt.cc +++ b/gcc/ifcvt.cc @@ -3213,13 +3213,13 @@ try_emit_cmove_seq (struct noce_if_info *if_info, rtx temp, /* We have something like: if (x > y) - { i = a; j = b; k = c; } + { i = EXPR_A; j = EXPR_B; k = EXPR_C; } Make it: - tmp_i = (x > y) ? a : i; - tmp_j = (x > y) ? b : j; - tmp_k = (x > y) ? c : k; + tmp_i = (x > y) ? EXPR_A : i; + tmp_j = (x > y) ? EXPR_B : j; + tmp_k = (x > y) ? EXPR_C : k; i = tmp_i; j = tmp_j; k = tmp_k; @@ -3635,11 +3635,10 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info, -/* Return true iff basic block TEST_BB is comprised of only - (SET (REG) (REG)) insns suitable for conversion to a series - of conditional moves. Also check that we have more than one set - (other routines can handle a single set better than we would), and - fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets. While going +/* Return true iff basic block TEST_BB is suitable for conversion to a + series of conditional moves. Also check that we have more than one + set (other routines can handle a single set better than we would), + and fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets. While going through the insns store the sum of their potential costs in COST. */ static bool @@ -3665,20 +3664,43 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb, unsigned *cost) rtx dest = SET_DEST (set); rtx src = SET_SRC (set); - /* We can possibly relax this, but for now only handle REG to REG -(including subreg) moves. This avoids any issues that might come -from introducing loads/stores that might violate data-race-freedom -guarantees. */ - if (!REG_P (dest)) + /* Do not handle anything involving memory loads/stores since it might +violate data-race-freedom guarantees. */ + if (!REG_P (dest) || contains_mem_rtx_p (src)) return false; - if (!((REG_P (src) || CONSTANT_P (src)) - || (GET_CODE (src) == SUBREG && REG_P (SUBREG_REG (src)) - && subreg_lowpart_p (src + /* Allow a wide range of operations and let the costing function decide +if the conversion is worth it later. */ + enum rtx_code code = GET_CODE (src); + if (!(CONSTANT_P (src) + || code == REG + || code == SUBREG + || code == ZERO_EXTEND + || code == SIGN_EXTEND + || code == NOT + || code == NEG + || code == PLUS + || code == MINUS + || code == AND + || code == IOR + || code == MULT + || code == ASHIFT + || code == ASHIFTRT + || code == NE + || code == EQ + || code == GE + || code == GT + || code == LE + || code == LT + || code == GEU + || code == GTU + || code == LEU + || code == LTU + || code == COMPARE)) return false; - /* Destination must be appropriate for a conditional write. */ - if (!noce_operand_ok (dest)) + /* Destination and source must be appropriate. */ + if (!noce_operand_ok (dest) || !noce_operand_ok (src)) return false; /* We must be able to conditionally move in this mode. */ diff --git a/gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c b/gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c new file mode 100644 index 000..f29cc72263a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c @@ -0,0 +1,67 @@ +/* { dg-do compile } */ +/* { dg-options "-O2
Fix profile updates in copy-header
Hi, most common source of profile mismatches is now copyheader pass. The reason is that in comon case the duplicated header condition will become constant true and that needs changes in the loop exit condition probability. While this can be done by jump threading it is not, since it gives up on loops. Copy header pass now has logic to prove that first exit will become true, so this patch adds necessary pumbing to the profile updating. This is done in gimple_duplicate_sese_region in a way that is specific for this particular case. I think general case is kind-of unsolvable and loop-ch is the only user of the infrastructure. If we later invent some new users, maybe we can export the region and region_copy arrays and let user to do the update. With the patch we now get: Pass dump id and name|static mismat|dynamic mismatch |in count |in count 107t cunrolli| 3+3|19237 +19237 127t ch | 13 +10|19237 131t dom | 39 +26|19237 133t isolate-paths | 47+8|19237 134t reassoc | 49+2|19237 136t forwprop| 53+4| 226943 +207706 159t cddce | 61+8| 24 +15279 161t ldist | 62+1| 24 172t ifcvt | 66+4| 415472 +173250 173t vect|143 +77| 10859784+10444312 176t cunroll |294 +151|150357763 +139497979 183t loopdone|291-3|150289533 -68230 194t tracer |322 +31|153230990 +2941457 195t fre |317-5|153230990 197t dom |286 -31|154448079 +1217089 199t threadfull |293+7|154724763 +276684 200t vrp |297+4|155042448 +317685 204t dce |294-3|155017073 -25375 206t sink|292-2|155017073 211t cddce |298+6|155018657+1584 255t optimized |296-2|155018657 256r expand |273 -23|154592622 -426035 258r into_cfglayout |268-5|154592661 +39 275r loop2_unroll|272+4|159701866 +5109205 291r ce2 |270-2|159723509 312r pro_and_epilogue|290 +20|159792505 +68996 315r jump2 |296+6|164234016 +4441511 323r bbro|294-2|159385430 -4848586 So ch introduces 10 new mismatches while originally it did 308. At bbro the number of mismatches dropped from 432 to 294. Most offender is now cunroll pass. I think it is the case where loop has multiple exits and one of exits becomes to be false in all but last peeled iteration. This is another case where non-trivial loop update is needed. Honza gcc/ChangeLog: * tree-cfg.cc (gimple_duplicate_sese_region): Add elliminated_edge parmaeter; update profile. * tree-cfg.h (gimple_duplicate_sese_region): Update prototype. * tree-ssa-loop-ch.cc (entry_loop_condition_is_static): Rename to ... (static_loop_exit): ... this; return the edge to be elliminated. (ch_base::copy_headers): Handle profile updating for eliminated exits. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ifc-20040816-1.c: Reduce number of mismatches from 2 to 1. * gcc.dg/tree-ssa/loop-ch-profile-1.c: New test. * gcc.dg/tree-ssa/loop-ch-profile-2.c: New test. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-20040816-1.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-20040816-1.c index f8a6495cbaa..b55a533e374 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ifc-20040816-1.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-20040816-1.c @@ -39,4 +39,4 @@ int main1 () which is folded by vectorizer. Both outgoing edges must have probability 100% so the resulting profile match after folding. */ /* { dg-final { scan-tree-dump-times "Invalid sum of outgoing probabilities 200.0" 1 "ifcvt" } } */ -/* { dg-final { scan-tree-dump-times "Invalid sum of incoming counts" 2 "ifcvt" } } */ +/* { dg-final { scan-tree-dump-times "Invalid sum of incoming counts" 1 "ifcvt" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-ch-profile-1.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-ch-profile-1.c new file mode 100644 index 000..e8bab62b0d9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-ch-profile-1.c @@ -0
Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering
> There has to be some kind of mismatch between the patch or testcase > or what we're looking at to judge success. Yeah I think the initially posted example was misleading because it contained an already working example. > While I really don't see the need to have the bridge pattern, I'm > still willing to believe that I've missed something, which is why I > wanted to dive into it myself. For example, we have heuristics to > avoid trying too many 4->n combine patterns and we might be tripping > over that or who knows what. > > So my suggestion is that if both of you are getting the desired code, > then Robin handle the review side of the two patches that introduce > the helper patterns. I went over both patches again and given the context they seem reasonable to me. I'd propose go with both of them for now and - in the meanwhile - I'm going to brush up on my combine knowledge some time in the next weeks and get back to this then, hopefully with a better explanation than my last one. Regards Robin
Re: [PATCH] RISC-V: improve codegen for repeating large constants [3]
On 7/1/23 02:00, Andrew Waterman wrote: Yeah, that might end up being a false economy for superscalars. In general, I wouldn't recommend spending too many cleverness beans on non-Zba+Zbb implementations. Going forward, we should expect that even very simple cores provide those extensions. I suspect you under-estimate how difficult it is to get the distros to move forward on baseline ISAs. jeff
[GCC 11][committed] d: Fix ICE in setValue, at d/dmd/dinterpret.c:7013
Hi, This patch backports ICE fix from upstream which is already part of GCC-12 and later. When casting null to integer or real, instead of painting the type on the NullExp, we emplace an IntegerExp/RealExp with the value zero. Same as when casting from NullExp to bool. Bootstrapped and regression tested on x86_64-linux-gnu, committed to releases/gcc-11, and backported to releases/gcc-10. Regards, Iain. --- Reviewed-on: https://github.com/dlang/dmd/pull/13172 PR d/110511 gcc/d/ChangeLog: * dmd/dinterpret.c (Interpreter::visit (CastExp *)): Handle casting null to int or float. gcc/testsuite/ChangeLog: * gdc.test/compilable/test21794.d: New test. --- gcc/d/dmd/dinterpret.c| 12 - gcc/testsuite/gdc.test/compilable/test21794.d | 52 +++ 2 files changed, 63 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gdc.test/compilable/test21794.d diff --git a/gcc/d/dmd/dinterpret.c b/gcc/d/dmd/dinterpret.c index ab9d88c660c..d4cfb0caacb 100644 --- a/gcc/d/dmd/dinterpret.c +++ b/gcc/d/dmd/dinterpret.c @@ -5792,12 +5792,22 @@ public: } if (e->to->ty == Tsarray) e1 = resolveSlice(e1); -if (e->to->toBasetype()->ty == Tbool && e1->type->ty == Tpointer) +Type *tobt = e->to->toBasetype(); +if (tobt->ty == Tbool && e1->type->ty == Tpointer) { new(pue) IntegerExp(e->loc, e1->op != TOKnull, e->to); result = pue->exp(); return; } +else if (tobt->isTypeBasic() && e1->op == TOKnull) +{ +if (tobt->isintegral()) +new(pue) IntegerExp(e->loc, 0, e->to); +else if (tobt->isreal()) +new(pue) RealExp(e->loc, CTFloat::zero, e->to); +result = pue->exp(); +return; +} result = ctfeCast(pue, e->loc, e->type, e->to, e1); } diff --git a/gcc/testsuite/gdc.test/compilable/test21794.d b/gcc/testsuite/gdc.test/compilable/test21794.d new file mode 100644 index 000..68e504bce56 --- /dev/null +++ b/gcc/testsuite/gdc.test/compilable/test21794.d @@ -0,0 +1,52 @@ +// https://issues.dlang.org/show_bug.cgi?id=21794 +/* +TEST_OUTPUT: +--- +0 +0u +0L +0LU +0.0F +0.0 +0.0L +--- +*/ + +bool fun(void* p) { +const x = cast(ulong)p; +return 1; +} + +static assert(fun(null)); + +T fun2(T)(void* p) { +const x = cast(T)p; +return x; +} + +// These were an error before, they were returning a NullExp instead of IntegerExp/RealExp + +static assert(fun2!int(null)== 0); +static assert(fun2!uint(null) == 0); +static assert(fun2!long(null) == 0); +static assert(fun2!ulong(null) == 0); +static assert(fun2!float(null) == 0); +static assert(fun2!double(null) == 0); +static assert(fun2!real(null) == 0); + +// These were printing 'null' instead of the corresponding number + +const i = cast(int)null; +const ui = cast(uint)null; +const l = cast(long)null; +const ul = cast(ulong)null; +const f = cast(float)null; +const d = cast(double)null; +const r = cast(real)null; +pragma(msg, i); +pragma(msg, ui); +pragma(msg, l); +pragma(msg, ul); +pragma(msg, f); +pragma(msg, d); +pragma(msg, r); -- 2.39.2
Re: [pushed] wwwdocs: Add GCC Code of Conduct
On Tue, 20 Jun 2023, Jason Merrill via Gcc-patches wrote: > As announced on gcc@. Here is a minor follow-up that I just pushed. Gerald >From f87deaa12cccb4b7398a8ec3b306cb4185aae012 Mon Sep 17 00:00:00 2001 From: Gerald Pfeifer Date: Fri, 30 Jun 2023 14:59:27 +0200 Subject: [PATCH] conduct: Fix nested lists --- htdocs/conduct.html | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/htdocs/conduct.html b/htdocs/conduct.html index 8fb62e86..da940a47 100644 --- a/htdocs/conduct.html +++ b/htdocs/conduct.html @@ -61,7 +61,7 @@ affect a person's ability to participate within them. Be careful in the words that you choose. Be kind to others. Do not insult or put down other participants. Harassment and other exclusionary behavior aren't acceptable. This includes, but is not limited - to: + to: Violent threats or language directed against another person. @@ -73,6 +73,7 @@ affect a person's ability to participate within them. Advocating for, or encouraging, any of the above behavior. Repeated harassment of others. In general, if someone asks you to stop, then stop. + When we disagree, try to understand why. Disagreements, both social and technical, happen all the time and the GCC community is no -- 2.41.0
Re: [PATCH] RISC-V: improve codegen for repeating large constants [3]
On Sat, 01 Jul 2023 07:04:16 PDT (-0700), jeffreya...@gmail.com wrote: On 7/1/23 02:00, Andrew Waterman wrote: Yeah, that might end up being a false economy for superscalars. In general, I wouldn't recommend spending too many cleverness beans on non-Zba+Zbb implementations. Going forward, we should expect that even very simple cores provide those extensions. I suspect you under-estimate how difficult it is to get the distros to move forward on baseline ISAs. Ya, we haven't even gotten to the point where most implementations are shipping with the B extensions, much less to the point where we can start ignoring all the pre-B hardware.
Re: [PATCH] RISC-V: improve codegen for repeating large constants [3]
On Sat, Jul 1, 2023 at 7:04 AM Jeff Law wrote: > > > > On 7/1/23 02:00, Andrew Waterman wrote: > > > > > Yeah, that might end up being a false economy for superscalars. > > > > In general, I wouldn't recommend spending too many cleverness beans on > > non-Zba+Zbb implementations. Going forward, we should expect that > > even very simple cores provide those extensions. > I suspect you under-estimate how difficult it is to get the distros to > move forward on baseline ISAs. Yeah, true. > > jeff
[committed] d: Don't generate code that throws exceptions when compiling with `-fno-exceptions'
Hi, The version flags for RTMI, RTTI, and exceptions was unconditionally predefined. These are now only predefined if the feature flag is enabled. It was noticed that there was no `-fexceptions' definition inside d/lang.opt, so the detection of the exceptions option flag was only partially working. Once that was fixed, a few places in the front-end implementation were found to fall fowl of `nothrow' rules, these have been fixed upstream and backported here as well. Bootstrapped and regression tested on x86_64-linux-gnu{-m64,-m32}, committed to mainline, and backported to releases/gcc-13. Regards, Iain. --- Reviewed-on: https://github.com/dlang/dmd/pull/15357 https://github.com/dlang/dmd/pull/15360 PR d/110471 gcc/d/ChangeLog: * d-builtins.cc (d_init_versions): Predefine D_ModuleInfo, D_Exceptions, and D_TypeInfo only if feature is enabled. * lang.opt: Add -fexceptions. gcc/testsuite/ChangeLog: * gdc.dg/pr110471a.d: New test. * gdc.dg/pr110471b.d: New test. * gdc.dg/pr110471c.d: New test. (cherry picked from commit da108c75ad386b3f1f47abb2265296e4b61d578a) --- gcc/d/d-builtins.cc | 9 ++--- gcc/d/dmd/root/array.d | 2 +- gcc/d/dmd/semantic2.d| 3 +-- gcc/d/dmd/semantic3.d| 2 +- gcc/d/lang.opt | 4 gcc/testsuite/gdc.dg/pr110471a.d | 5 + gcc/testsuite/gdc.dg/pr110471b.d | 5 + gcc/testsuite/gdc.dg/pr110471c.d | 5 + 8 files changed, 28 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/gdc.dg/pr110471a.d create mode 100644 gcc/testsuite/gdc.dg/pr110471b.d create mode 100644 gcc/testsuite/gdc.dg/pr110471c.d diff --git a/gcc/d/d-builtins.cc b/gcc/d/d-builtins.cc index f40888019ce..60f76fc694c 100644 --- a/gcc/d/d-builtins.cc +++ b/gcc/d/d-builtins.cc @@ -500,9 +500,12 @@ d_init_versions (void) VersionCondition::addPredefinedGlobalIdent ("D_BetterC"); else { - VersionCondition::addPredefinedGlobalIdent ("D_ModuleInfo"); - VersionCondition::addPredefinedGlobalIdent ("D_Exceptions"); - VersionCondition::addPredefinedGlobalIdent ("D_TypeInfo"); + if (global.params.useModuleInfo) + VersionCondition::addPredefinedGlobalIdent ("D_ModuleInfo"); + if (global.params.useExceptions) + VersionCondition::addPredefinedGlobalIdent ("D_Exceptions"); + if (global.params.useTypeInfo) + VersionCondition::addPredefinedGlobalIdent ("D_TypeInfo"); } if (optimize) diff --git a/gcc/d/dmd/root/array.d b/gcc/d/dmd/root/array.d index 541a12d9e1d..d1c61be7344 100644 --- a/gcc/d/dmd/root/array.d +++ b/gcc/d/dmd/root/array.d @@ -574,7 +574,7 @@ unittest private template arraySortWrapper(T, alias fn) { pragma(mangle, "arraySortWrapper_" ~ T.mangleof ~ "_" ~ fn.mangleof) -extern(C) int arraySortWrapper(scope const void* e1, scope const void* e2) nothrow +extern(C) int arraySortWrapper(scope const void* e1, scope const void* e2) { return fn(cast(const(T*))e1, cast(const(T*))e2); } diff --git a/gcc/d/dmd/semantic2.d b/gcc/d/dmd/semantic2.d index 440e4cbc8e7..ee268d95251 100644 --- a/gcc/d/dmd/semantic2.d +++ b/gcc/d/dmd/semantic2.d @@ -807,9 +807,8 @@ private void doGNUABITagSemantic(ref Expression e, ref Expression* lastTag) // but it's a concession to practicality. // Casts are unfortunately necessary as `implicitConvTo` is not // `const` (and nor is `StringExp`, by extension). -static int predicate(const scope Expression* e1, const scope Expression* e2) nothrow +static int predicate(const scope Expression* e1, const scope Expression* e2) { -scope(failure) assert(0, "An exception was thrown"); return (cast(Expression*)e1).toStringExp().compare((cast(Expression*)e2).toStringExp()); } ale.elements.sort!predicate; diff --git a/gcc/d/dmd/semantic3.d b/gcc/d/dmd/semantic3.d index 33a43187fa8..a912e768f0c 100644 --- a/gcc/d/dmd/semantic3.d +++ b/gcc/d/dmd/semantic3.d @@ -1420,7 +1420,7 @@ private extern(C++) final class Semantic3Visitor : Visitor * https://issues.dlang.org/show_bug.cgi?id=14246 */ AggregateDeclaration ad = ctor.isMemberDecl(); -if (!ctor.fbody || !ad || !ad.fieldDtor || !global.params.dtorFields || global.params.betterC || ctor.type.toTypeFunction.isnothrow) +if (!ctor.fbody || !ad || !ad.fieldDtor || !global.params.dtorFields || !global.params.useExceptions || ctor.type.toTypeFunction.isnothrow) return visit(cast(FuncDeclaration)ctor); /* Generate: diff --git a/gcc/d/lang.opt b/gcc/d/lang.opt index 26ca92c4c17..98a95c1dc38 100644 --- a/gcc/d/lang.opt +++ b/gcc/d/lang.opt @@ -291,6 +291,10 @@ fdump-d-original D Display the frontend AST after parsing and semantic passes. +fexceptions +D +; Documented in common.opt + fextern-std= D Joined RejectNegative Enum(extern_stdcpp) Var(flag_extern_stdcpp) -fextern-std=
[pushed] libphobos, testsuite: Disable forkgc2 on Darwin [PR103944]
From: Iain Sandoe This has been in use for some time across all the Darwin version supported by D. It has also been tested on x86_64-linux-gnu. Approved on irc by Iain Buclaw, pushed to main (and will be backported). thanks Iain --- 8< --- It hangs the testsuite (requiring manual intervention to kill the spawned processes) which breaks CI. The reason for the hang id not clear. This skips the test for now (xfail does not work). Signed-off-by: Iain Sandoe PR d/103944 libphobos/ChangeLog: * testsuite/libphobos.gc/forkgc2.d: Skip for Darwin. --- libphobos/testsuite/libphobos.gc/forkgc2.d | 1 + 1 file changed, 1 insertion(+) diff --git a/libphobos/testsuite/libphobos.gc/forkgc2.d b/libphobos/testsuite/libphobos.gc/forkgc2.d index de7796ced72..38d0d0c2f93 100644 --- a/libphobos/testsuite/libphobos.gc/forkgc2.d +++ b/libphobos/testsuite/libphobos.gc/forkgc2.d @@ -1,3 +1,4 @@ +// { dg-skip-if "test hangs the testsuite PR103944" { *-*-darwin* } } import core.stdc.stdlib : exit; import core.sys.posix.sys.wait : waitpid; import core.sys.posix.unistd : fork; -- 2.39.2 (Apple Git-143)
[PATCH 2/2] xtensa: The use of CLAMPS instruction also requires TARGET_MINMAX, as well as TARGET_CLAMPS
Because both smin and smax requiring TARGET_MINMAX are essential to the RTL representation. gcc/ChangeLog: * config/xtensa/xtensa.cc (xtensa_match_CLAMPS_imms_p): Simplify. * config/xtensa/xtensa.md (*xtensa_clamps): Add TARGET_MINMAX to the condition. --- gcc/config/xtensa/xtensa.cc | 7 ++- gcc/config/xtensa/xtensa.md | 4 ++-- 2 files changed, 4 insertions(+), 7 deletions(-) diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc index dd35e63c094..3298d53493c 100644 --- a/gcc/config/xtensa/xtensa.cc +++ b/gcc/config/xtensa/xtensa.cc @@ -2649,11 +2649,8 @@ xtensa_emit_add_imm (rtx dst, rtx src, HOST_WIDE_INT imm, rtx scratch, bool xtensa_match_CLAMPS_imms_p (rtx cst_max, rtx cst_min) { - int max, min; - - return IN_RANGE (max = exact_log2 (-INTVAL (cst_max)), 7, 22) -&& IN_RANGE (min = exact_log2 (INTVAL (cst_min) + 1), 7, 22) -&& max == min; + return IN_RANGE (exact_log2 (-INTVAL (cst_max)), 7, 22) +&& (INTVAL (cst_max) + INTVAL (cst_min)) == -1; } diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md index b1af08eba8a..664424f1239 100644 --- a/gcc/config/xtensa/xtensa.md +++ b/gcc/config/xtensa/xtensa.md @@ -522,7 +522,7 @@ (smax:SI (smin:SI (match_operand:SI 1 "register_operand" "r") (match_operand:SI 2 "const_int_operand" "i")) (match_operand:SI 3 "const_int_operand" "i")))] - "TARGET_CLAMPS + "TARGET_MINMAX && TARGET_CLAMPS && xtensa_match_CLAMPS_imms_p (operands[3], operands[2])" "#" "&& 1" @@ -540,7 +540,7 @@ (smin:SI (smax:SI (match_operand:SI 1 "register_operand" "r") (match_operand:SI 2 "const_int_operand" "i")) (match_operand:SI 3 "const_int_operand" "i")))] - "TARGET_CLAMPS + "TARGET_MINMAX && TARGET_CLAMPS && xtensa_match_CLAMPS_imms_p (operands[2], operands[3])" { static char result[64]; -- 2.30.2
[PATCH 1/2] xtensa: Fix missing mode warning in "*eqne_INT_MIN"
gcc/ChangeLog: * config/xtensa/xtensa.md (*eqne_INT_MIN): Add missing ":SI" to the match_operator. --- gcc/config/xtensa/xtensa.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md index 4b4ab3f5f37..b1af08eba8a 100644 --- a/gcc/config/xtensa/xtensa.md +++ b/gcc/config/xtensa/xtensa.md @@ -3191,7 +3191,7 @@ (define_insn_and_split "*eqne_INT_MIN" [(set (match_operand:SI 0 "register_operand" "=a") - (match_operator 2 "boolean_operator" + (match_operator:SI 2 "boolean_operator" [(match_operand:SI 1 "register_operand" "r") (const_int -2147483648)]))] "TARGET_ABS" -- 2.30.2
New Croatian PO file for 'gcc' (version 13.1.0)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'gcc' has been submitted by the Croatian team of translators. The file is available at: https://translationproject.org/latest/gcc/hr.po (This file, 'gcc-13.1.0.hr.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: https://translationproject.org/latest/gcc/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: https://translationproject.org/domain/gcc.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator.
[PATCH] Use chain_next on eh_landing_pad_d for GTY (PR middle-end/110510)
The backtrace in the bug report suggest there is a running out of stack during GC collection, because of a long chain of eh_landing_pad_d. This might fix that by adding chain_next onto eh_landing_pad_d's GTY marker. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: PR middle-end/110510 * except.h (struct eh_landing_pad_d): Add chain_next GTY. --- gcc/except.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/except.h b/gcc/except.h index 378a9e4cb77..173b0f026db 100644 --- a/gcc/except.h +++ b/gcc/except.h @@ -66,7 +66,7 @@ enum eh_region_type /* A landing pad for a given exception region. Any transfer of control from the EH runtime to the function happens at a landing pad. */ -struct GTY(()) eh_landing_pad_d +struct GTY((chain_next("%h.next_lp"))) eh_landing_pad_d { /* The linked list of all landing pads associated with the region. */ struct eh_landing_pad_d *next_lp; -- 2.31.1
[PATCH] gcc-ar: Handle response files properly [PR77576]
Basically implementing what Andrew said in the PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77576 If @file has been passed to gcc-ar, do the following: 1) Expand it to get an argv without any @files. 2) Then apply the plugin modifications to argv. 3) Create temporary response file. 4) Put the modified argv in the temporary file. 5) Call ar with @tmp. 6) Delete the temporary response file. 0001-gcc-ar-Handle-response-files-properly-PR77576.patch Description: Binary data
[committed] d: Fix accesses of immutable arrays using constant index still bounds checked
Hi, This patch sets TREE_READONLY on all non-static const and immutable variables in D, as well as all static immutable variables that aren't initialized by a module constructor. This allows more aggressive constant folding of D code which makes use of `immutable' or `const'. Bootstrapped and regression tested on x86_64-linux-gnu, committed to mainline, and backported to releases/gcc-13 and releases/gcc-12. Regards, Iain. --- PR d/110514 gcc/d/ChangeLog: * decl.cc (get_symbol_decl): Set TREE_READONLY on certain kinds of const and immutable variables. * expr.cc (ExprVisitor::visit (ArrayLiteralExp *)): Set TREE_READONLY on immutable dynamic array literals. gcc/testsuite/ChangeLog: * gdc.dg/pr110514a.d: New test. * gdc.dg/pr110514b.d: New test. * gdc.dg/pr110514c.d: New test. * gdc.dg/pr110514d.d: New test. --- gcc/d/decl.cc| 14 ++ gcc/d/expr.cc| 4 gcc/testsuite/gdc.dg/pr110514a.d | 9 + gcc/testsuite/gdc.dg/pr110514b.d | 8 gcc/testsuite/gdc.dg/pr110514c.d | 8 gcc/testsuite/gdc.dg/pr110514d.d | 8 6 files changed, 51 insertions(+) create mode 100644 gcc/testsuite/gdc.dg/pr110514a.d create mode 100644 gcc/testsuite/gdc.dg/pr110514b.d create mode 100644 gcc/testsuite/gdc.dg/pr110514c.d create mode 100644 gcc/testsuite/gdc.dg/pr110514d.d diff --git a/gcc/d/decl.cc b/gcc/d/decl.cc index 78c4ab554dc..3f980851259 100644 --- a/gcc/d/decl.cc +++ b/gcc/d/decl.cc @@ -1277,6 +1277,20 @@ get_symbol_decl (Declaration *decl) DECL_INITIAL (decl->csym) = build_expr (ie, true); } } + + /* [type-qualifiers/const-and-immutable] + +`immutable` applies to data that cannot change. Immutable data values, +once constructed, remain the same for the duration of the program's +execution. */ + if (vd->isImmutable () && !vd->setInCtorOnly ()) + TREE_READONLY (decl->csym) = 1; + + /* `const` applies to data that cannot be changed by the const reference +to that data. It may, however, be changed by another reference to that +same data. */ + if (vd->isConst () && !vd->isDataseg ()) + TREE_READONLY (decl->csym) = 1; } /* Set the declaration mangled identifier if static. */ diff --git a/gcc/d/expr.cc b/gcc/d/expr.cc index c6245ff5fc1..b7cec1327fd 100644 --- a/gcc/d/expr.cc +++ b/gcc/d/expr.cc @@ -2701,6 +2701,10 @@ public: if (tb->ty == TY::Tarray) ctor = d_array_value (type, size_int (e->elements->length), ctor); + /* Immutable data can be placed in rodata. */ + if (tb->isImmutable ()) + TREE_READONLY (decl) = 1; + d_pushdecl (decl); rest_of_decl_compilation (decl, 1, 0); } diff --git a/gcc/testsuite/gdc.dg/pr110514a.d b/gcc/testsuite/gdc.dg/pr110514a.d new file mode 100644 index 000..46e370527d3 --- /dev/null +++ b/gcc/testsuite/gdc.dg/pr110514a.d @@ -0,0 +1,9 @@ +// { dg-do "compile" } +// { dg-options "-O -fdump-tree-optimized" } +immutable uint[] imm_arr = [1,2,3]; +int test_imm(immutable uint[] ptr) +{ +return imm_arr[2] == 3 ? 123 : 456; +} +// { dg-final { scan-assembler-not "_d_arraybounds_indexp" } } +// { dg-final { scan-tree-dump "return 123;" optimized } } diff --git a/gcc/testsuite/gdc.dg/pr110514b.d b/gcc/testsuite/gdc.dg/pr110514b.d new file mode 100644 index 000..86aeb485c34 --- /dev/null +++ b/gcc/testsuite/gdc.dg/pr110514b.d @@ -0,0 +1,8 @@ +// { dg-do "compile" } +// { dg-options "-O" } +immutable uint[] imm_ctor_arr; +int test_imm_ctor(immutable uint[] ptr) +{ +return imm_ctor_arr[2] == 3; +} +// { dg-final { scan-assembler "_d_arraybounds_indexp" } } diff --git a/gcc/testsuite/gdc.dg/pr110514c.d b/gcc/testsuite/gdc.dg/pr110514c.d new file mode 100644 index 000..94779e123a4 --- /dev/null +++ b/gcc/testsuite/gdc.dg/pr110514c.d @@ -0,0 +1,8 @@ +// { dg-do "compile" } +// { dg-options "-O" } +const uint[] cst_arr = [1,2,3]; +int test_cst(const uint[] ptr) +{ +return cst_arr[2] == 3; +} +// { dg-final { scan-assembler "_d_arraybounds_indexp" } } diff --git a/gcc/testsuite/gdc.dg/pr110514d.d b/gcc/testsuite/gdc.dg/pr110514d.d new file mode 100644 index 000..56e9a3139ea --- /dev/null +++ b/gcc/testsuite/gdc.dg/pr110514d.d @@ -0,0 +1,8 @@ +// { dg-do "compile" } +// { dg-options "-O" } +const uint[] cst_ctor_arr; +int test_cst_ctor(const uint[] ptr) +{ +return cst_ctor_arr[2] == 3; +} +// { dg-final { scan-assembler "_d_arraybounds_indexp" } } -- 2.39.2
[committed] d: Fix core.volatile.volatileLoad discarded if result is unused
Hi, The first pass of code generation in the D front-end splits up all compound expressions and discards expressions that have no side effects. This included calls to the `volatileLoad' intrinsic if its result was not used, causing such calls to be eliminated from the program. We already set TREE_THIS_VOLATILE on the expression, however the tree documentation says if this bit is set in an expression, so is TREE_SIDE_EFFECTS. So set TREE_SIDE_EFFECTS on the expression too. This prevents any early discarding from occuring. Bootstrapped and regression tested on x86_64-linux-gnu/-m32, committed to mainline, and backported to releases/gcc-13, gcc-12, and gcc-11. Regards, Iain. --- PR d/110516 gcc/d/ChangeLog: * intrinsics.cc (expand_volatile_load): Set TREE_SIDE_EFFECTS on the expanded expression. (expand_volatile_store): Likewise. gcc/testsuite/ChangeLog: * gdc.dg/torture/pr110516a.d: New test. * gdc.dg/torture/pr110516b.d: New test. --- gcc/d/intrinsics.cc | 2 ++ gcc/testsuite/gdc.dg/torture/pr110516a.d | 12 gcc/testsuite/gdc.dg/torture/pr110516b.d | 12 3 files changed, 26 insertions(+) create mode 100644 gcc/testsuite/gdc.dg/torture/pr110516a.d create mode 100644 gcc/testsuite/gdc.dg/torture/pr110516b.d diff --git a/gcc/d/intrinsics.cc b/gcc/d/intrinsics.cc index 0121d81eb14..aaf04e50baa 100644 --- a/gcc/d/intrinsics.cc +++ b/gcc/d/intrinsics.cc @@ -1007,6 +1007,7 @@ expand_volatile_load (tree callexp) tree type = build_qualified_type (TREE_TYPE (ptrtype), TYPE_QUAL_VOLATILE); tree result = indirect_ref (type, ptr); TREE_THIS_VOLATILE (result) = 1; + TREE_SIDE_EFFECTS (result) = 1; return result; } @@ -1034,6 +1035,7 @@ expand_volatile_store (tree callexp) tree type = build_qualified_type (TREE_TYPE (ptrtype), TYPE_QUAL_VOLATILE); tree result = indirect_ref (type, ptr); TREE_THIS_VOLATILE (result) = 1; + TREE_SIDE_EFFECTS (result) = 1; /* (*(volatile T *) ptr) = value; */ tree value = CALL_EXPR_ARG (callexp, 1); diff --git a/gcc/testsuite/gdc.dg/torture/pr110516a.d b/gcc/testsuite/gdc.dg/torture/pr110516a.d new file mode 100644 index 000..276455ae408 --- /dev/null +++ b/gcc/testsuite/gdc.dg/torture/pr110516a.d @@ -0,0 +1,12 @@ +// https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110516 +// { dg-do compile } +// { dg-options "-fno-moduleinfo -fdump-tree-optimized" } +void fn110516(ubyte* ptr) +{ +import core.volatile : volatileLoad; +volatileLoad(ptr); +volatileLoad(ptr); +volatileLoad(ptr); +volatileLoad(ptr); +} +// { dg-final { scan-tree-dump-times " ={v} " 4 "optimized" } } diff --git a/gcc/testsuite/gdc.dg/torture/pr110516b.d b/gcc/testsuite/gdc.dg/torture/pr110516b.d new file mode 100644 index 000..b7a67e716a5 --- /dev/null +++ b/gcc/testsuite/gdc.dg/torture/pr110516b.d @@ -0,0 +1,12 @@ +// https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110516 +// { dg-do compile } +// { dg-options "-fno-moduleinfo -fdump-tree-optimized" } +void fn110516(ubyte* ptr) +{ +import core.volatile : volatileStore; +volatileStore(ptr, 0); +volatileStore(ptr, 0); +volatileStore(ptr, 0); +volatileStore(ptr, 0); +} +// { dg-final { scan-tree-dump-times " ={v} " 4 "optimized" } } -- 2.39.2