[Bug rtl-optimization/98791] [11 Regression] ICE in paradoxical_subreg_p (in ira) with SVE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98791 Jeffrey A. Law changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED CC||law at gcc dot gnu.org --- Comment #5 from Jeffrey A. Law --- Fixed by Andre's patch on the trunk.
[Bug target/95636] ICE in sched2: in create_block_for_bookkeeping, at sel-sched.c:4549
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95636 Jeffrey A. Law changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE CC||law at gcc dot gnu.org --- Comment #1 from Jeffrey A. Law --- Almost certainly a duplicate. *** This bug has been marked as a duplicate of bug 99347 ***
[Bug rtl-optimization/99347] [9/10/11 Regression] ICE in create_block_for_bookkeeping, at sel-sched.c:4549 since r9-6859-g25eafae67f186cfa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99347 Jeffrey A. Law changed: What|Removed |Added CC||qianchao9 at huawei dot com --- Comment #4 from Jeffrey A. Law --- *** Bug 95636 has been marked as a duplicate of this bug. ***
[Bug rtl-optimization/98791] [10 Regression] ICE in paradoxical_subreg_p (in ira) with SVE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98791 Jeffrey A. Law changed: What|Removed |Added Summary|[11 Regression] ICE in |[10 Regression] ICE in |paradoxical_subreg_p (in|paradoxical_subreg_p (in |ira) with SVE |ira) with SVE --- Comment #7 from Jeffrey A. Law --- In that case, make sure to update the Summary/Title so that it appears on the right regression lists :-)
[Bug rtl-optimization/98973] [11 regression] Wrong code with gcse store motion pass
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98973 --- Comment #12 from Jeffrey A. Law --- IIRC LSM is quite restricted in the types of MEM expressions it will optimize. In particular I think they have to be SYMBOL_REFs which severely limits LSM's effectiveness. I would support removing it given that it's not enabled by default anywhere and is of limited utility.
[Bug tree-optimization/94092] Code size and performance degradations after -ftree-loop-distribute-patterns was enabled at -O[2s]+
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94092 Jeffrey A. Law changed: What|Removed |Added CC||law at gcc dot gnu.org --- Comment #14 from Jeffrey A. Law --- WRT Jim's comment about alignments in c#6. Right now a pointer's alignment is really only used to eliminate unnecessary masking -- we don't propagate a pointer's known alignment to improve the known alignment of memory operations involving that pointer. This is something I'd cobbled together for a closely related issue which will try to increase the known alignment of a MEM by using the alignment of a pointer to that MEM. We've gone a slightly different (and more effective) route for that internal issue, but this may still be worth polishing a bit and submitting. diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c index 972512e81..be9ff76b5 100644 --- a/gcc/emit-rtl.c +++ b/gcc/emit-rtl.c @@ -859,6 +859,28 @@ gen_rtx_MEM (machine_mode mode, rtx addr) we clear it here. */ MEM_ATTRS (rt) = 0; + /* If we can deduce a higher alignment for the memory access + based on the pointer, then it's advantageous to do so. */ + unsigned int align = 0; + if (REG_P (addr) + && REG_POINTER (addr)) +align = REGNO_POINTER_ALIGN (REGNO (addr)); + else if (GET_CODE (addr) == PLUS + && REG_P (XEXP (addr, 0)) + && REG_POINTER (XEXP (addr, 0)) + && REGNO_POINTER_ALIGN (REGNO (XEXP (addr, 0))) + && GET_CODE (XEXP (addr, 1)) == CONST_INT) +{ + unsigned int tmp = 1 << (ffs_hwi (INTVAL (XEXP (addr, 1))) - 1); + /* ALIGN is in bits. */ + tmp <<= 3; + align = REGNO_POINTER_ALIGN (REGNO (XEXP (addr, 0))); + align = (align > tmp) ? tmp : align; +} + + if (align > mode_mem_attrs[(int) mode]->align) +set_mem_align (rt, align); + return rt; }
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 --- Comment #22 from Jeffrey A. Law --- I have vague memories of it, but it wasn't my code. It was actually Craig Burley. It's original purpose was merely to allow converting *_DIV_EXPR into EXACT_DIV_EXPR which presumably was important for some g77 cases way back then.
[Bug tree-optimization/100727] New: [12 Regression] Recent change to WITH_SIZE_EXPR handling breaks mn10300-elf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100727 Bug ID: 100727 Summary: [12 Regression] Recent change to WITH_SIZE_EXPR handling breaks mn10300-elf Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: law at gcc dot gnu.org Target Milestone: --- This change: 2e6ad1ba532fe684633edac766c598be19ad3b59 is the first bad commit commit 2e6ad1ba532fe684633edac766c598be19ad3b59 Author: Richard Biener Date: Wed May 19 10:20:37 2021 +0200 Enable more WITH_SIZE_EXPR processing This enables the alias machinery for WITH_SIZE_EXPR which can appear in call LHS and arguments. In particular this drops the NULL return from get_base_address and it adjusts get_ref_base_and_extent and friends to use the size information in WITH_SIZE_EXPR and look through it for further processing. 2021-05-19 Richard Biener * builtins.c (get_object_alignment_1): Strip outer WITH_SIZE_EXPR. * tree-dfa.c (get_ref_base_and_extent): Handle outer WITH_SIZE_EXPR for size processing and process the containing ref. * tree-ssa-alias.c (ao_ref_base_alias_set): Strip outer WITH_SIZE_EXPR. (ao_ref_base_alias_ptr_type): Likewise. (refs_may_alias_p_2): Allow WITH_SIZE_EXPR in ref->ref and handle that accordingly, stripping it for the core alias workers. * tree.c (get_base_address): Handle WITH_SIZE_EXPR by looking through it instead of returning NULL. Causes a correctness regression on mn10300-elf for c-torture/execute/20020412-1.c at -O2. It appears to me that the assignments to x and y before the call to foo get erroneously removed. What's particularly interesting here is the .optimized dumps are the same, but the .expand dumps differ significantly. Obviously this points to a problem not in how this change affects the gimple optimizers, but how it affects the gimple/tree->RTL translation. http://3.14.90.209:8080/job/mn10300-elf/
[Bug tree-optimization/100727] [12 Regression] Recent change to WITH_SIZE_EXPR handling breaks mn10300-elf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100727 Jeffrey A. Law changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2021-05-23 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org
[Bug tree-optimization/100727] [12 Regression] Recent change to WITH_SIZE_EXPR handling breaks mn10300-elf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100727 --- Comment #1 from Jeffrey A. Law --- The v850-elf port is also seeing these failures in some of its multilib configurations.
[Bug bootstrap/100730] h8300-linux: unused parameter, statement may fall through, control reaches end of non-void function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100730 Jeffrey A. Law changed: What|Removed |Added Ever confirmed|0 |1 Assignee|unassigned at gcc dot gnu.org |law at gcc dot gnu.org CC||law at gcc dot gnu.org Status|UNCONFIRMED |NEW Last reconfirmed||2021-05-23
[Bug bootstrap/100730] h8300-linux: unused parameter, statement may fall through, control reaches end of non-void function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100730 Jeffrey A. Law changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #1 from Jeffrey A. Law --- Bah! Missed the tag in the ChangeLog to get the commit added to this BZ. https://gcc.gnu.org/pipermail/gcc-patches/2021-May/571470.html
[Bug tree-optimization/96674] Failure to optimize combination of comparisons to dec+compare
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96674 Jeffrey A. Law changed: What|Removed |Added Status|NEW |RESOLVED CC||law at gcc dot gnu.org Resolution|--- |FIXED --- Comment #11 from Jeffrey A. Law --- Resolved by Eugene's patch on the trunk.
[Bug middle-end/19987] [meta-bug] fold missing optimizations in general
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19987 Bug 19987 depends on bug 96674, which changed state. Bug 96674 Summary: Failure to optimize combination of comparisons to dec+compare https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96674 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug other/100735] -fno-trampolines doc wrongly implies it affects C, C++ etc.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100735 Jeffrey A. Law changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED CC||law at gcc dot gnu.org --- Comment #5 from Jeffrey A. Law --- Fixed with Paul's documentation change on the trunk.
[Bug tree-optimization/100934] [9/10/11/12 Regression] wrong code at -O3 during unrolling since r9-6299
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100934 --- Comment #7 from Jeffrey A. Law --- So when we're finding jump threads we know if we thread through the loop latch and we note when that's going to create an irreducible region. We generally suppress threading through the latch before the loop optimizers have run, but allow it afterwards. But I'm not aware of a really good place to adjust the loop bound estimates, particularly for the backwards threader. THe backwards threader uses copy_bbs API, so much of the guts of what's happening is opaque. Peek at jump_thread_path_registry:::duplicate_thread_path. All the backwards threader bits go through there at some point.
[Bug tree-optimization/101186] predictable comparison of integer variables not folded
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101186 Jeffrey A. Law changed: What|Removed |Added CC||law at gcc dot gnu.org --- Comment #6 from Jeffrey A. Law --- I wonder why we don't unswitch the loop twice to pull both conditionals out. If we were to do that, the a != 0 loop versions can be removed because they'll do nothing other than increase the loop counter. The a==0 variant will just have the printf and increment of b.
[Bug tree-optimization/108398] tree-object-size trips up with pointer arithmetic if an intermediate result is an invalid pointer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108398 Jeffrey A. Law changed: What|Removed |Added CC||law at gcc dot gnu.org --- Comment #1 from Jeffrey A. Law --- The compiler will sometimes create pointers outside any object -- the loop optimizers in particular will tend to do that. For the actual memory access, an offset will be applied to get the effective addresss of the memory reference into the proper object. It's also the case that Ada can create these inherently via "virtual origins" IIRC. I'm not sure this qualifies as a bug.
[Bug target/108484] [13 Regression] ICE building glibc for ia64 in cselib_subst_to_values, at cselib.cc:2148
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108484 Jeffrey A. Law changed: What|Removed |Added CC||law at gcc dot gnu.org --- Comment #1 from Jeffrey A. Law --- Created attachment 54325 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54325&action=edit Testcase for c6x
[Bug target/108484] [13 Regression] ICE building glibc for ia64 in cselib_subst_to_values, at cselib.cc:2148
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108484 Jeffrey A. Law changed: What|Removed |Added Last reconfirmed||2023-01-21 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW --- Comment #2 from Jeffrey A. Law --- I'm seeing this for c6x as well building libgcc. Testcase attached, compile with -O2 -g.
[Bug testsuite/108723] New: [13 Regression] Recent changes broke risc-v testsuite
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108723 Bug ID: 108723 Summary: [13 Regression] Recent changes broke risc-v testsuite Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: testsuite Assignee: unassigned at gcc dot gnu.org Reporter: law at gcc dot gnu.org Target Milestone: --- This change: commit 3cd08f7168c196d7a481b9ed9f4289fd1f14eea8 (refs/bisect/bad) Author: Andreas Schwab Date: Wed Jan 25 12:00:09 2023 +0100 riscv: Enable -fasynchronous-unwind-tables by default on Linux Broke several tests in the risc-v testsuite for riscv64-unknown-linux-gnu: FAIL: gcc.target/riscv/shorten-memrefs-2.c -Os scan-assembler store1a:\n\taddi FAIL: gcc.target/riscv/shorten-memrefs-2.c -Os scan-assembler load1r:\n\taddi FAIL: gcc.target/riscv/shorten-memrefs-2.c -Os scan-assembler load2r:\n\taddi XPASS: gcc.target/riscv/shorten-memrefs-3.c -Os scan-assembler-not load1a:\n\taddi FAIL: gcc.target/riscv/shorten-memrefs-5.c -Os scan-assembler store1a:\n\taddi FAIL: gcc.target/riscv/shorten-memrefs-5.c -Os scan-assembler load1r:\n\taddi XPASS: gcc.target/riscv/shorten-memrefs-6.c -Os scan-assembler-not load1a:\n\taddi FAIL: gcc.target/riscv/shorten-memrefs-8.c -Os scan-assembler store:\n\taddi\ta[0-7],a[0-7],1 FAIL: gcc.target/riscv/shorten-memrefs-8.c -Os scan-assembler load:\n\taddi\ta[0-7],a[0-7],1 I'm pretty sure the change causes new labels to be inserted in places where the scan-assembler strings are not expecting to find them.
[Bug target/108248] Some insns in the risc-v backend do not have mappings to functional units
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108248 --- Comment #5 from Jeffrey A. Law --- So a datapoint in this effort. For the Veyron V1, all the bitmanip instructions except clmul and cpop are single cycle and can be handled by any of the 4 standard ALUs. clmul, cpop are 4c and use the shared multi-cycle ALU. Obviously we may need to break things down further for other uarchs. But that's start.
[Bug target/108764] [RISCV] Cost model for RVB is too aggressive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108764 Jeffrey A. Law changed: What|Removed |Added CC||law at gcc dot gnu.org --- Comment #6 from Jeffrey A. Law --- For Veyron V1, those shNadds are single cycle and can issue to any of the 4 ALUs which makes the first sequence better for that uarch. As folks have noted, costing is highly dependent on the uarch. So in cases like this the right way to go is describe the costing in the uarch specific structure and query that much like we do for other operations.
[Bug target/108892] [13 Regression] unable to generate reloads for at -Og on riscv64 since r13-4907
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108892 Jeffrey A. Law changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |law at gcc dot gnu.org --- Comment #3 from Jeffrey A. Law --- This looks like a long-standing latent bug in combine. About 20 years ago support was added to combine to allow it to utilize a REG_EQUAL note to help simplifications. The code looks like this: /* Temporarily replace the set's source with the contents of the REG_EQUAL note. The insn will be deleted or recognized by try_combine. */ rtx orig_src = SET_SRC (set); rtx orig_dest = SET_DEST (set); if (GET_CODE (SET_DEST (set)) == ZERO_EXTRACT) SET_DEST (set) = XEXP (SET_DEST (set), 0); SET_SRC (set) = note; i2mod = temp; i2mod_old_rhs = copy_rtx (orig_src); i2mod_new_rhs = copy_rtx (note); next = try_combine (insn, i2mod, NULL, NULL, &new_direct_jump_p, last_combined_insn); i2mod = NULL; if (next) { statistics_counter_event (cfun, "insn-with-note combine", 1); goto retry; } SET_SRC (set) = orig_src; SET_DEST (set) = orig_dest; So assume that temp (from which SET in the above code was extracted) looks like this: (insn 122 117 127 2 (set (reg:DI 157 [ _46 ]) (ior:DI (reg:DI 200) (reg:DI 251))) "j.c":14:5 -1 (expr_list:REG_EQUAL (const_int 25769803782 [0x60006]) (nil))) Basically the constant isn't one we can load with a single insn, so we construct the constant using several insns and attach a REG_EQUAL note for the final result. Totally normal. We replace the SET_SRC with the contents of the note. This results in the following insn that gets passed down to try_combine as I2MOD: (insn 122 117 127 2 (set (reg:DI 157 [ _46 ]) (const_int 25769803782 [0x60006])) "j.c":14:5 -1 (expr_list:REG_EQUAL (const_int 25769803782 [0x60006]) (nil))) Exactly what I would expect. try_combine will actually try to recognize that insn and gets a match on the mvconst_internal. So when try_combine returns the insn looks like this: (insn 122 117 127 2 (set (reg:DI 157 [ _46 ]) (const_int 25769803782 [0x60006])) "j.c":14:5 177 {*mvconst_internal} (expr_list:REG_EQUAL (const_int 25769803782 [0x60006]) (nil))) Again, totally normal. Nothing wrong at this point. But note how we restore the SET_SRC/SET_DEST objects after returning from try_combine. After restoration we have: (insn 122 117 127 2 (set (reg:DI 157 [ _46 ]) (ior:DI (reg:DI 200) (reg:DI 251))) "j.c":14:5 177 {*mvconst_internal} (expr_list:REG_EQUAL (const_int 25769803782 [0x60006]) (nil))) And that's where things have gone off the rails. When we restore the SET_SRC/SET_DEST we need to either re-recognize or at least clear the INSN_CODE. This has *absolutely nothing* to do with multiple matching patterns. It's just a latent bug in the combiner.
[Bug target/109092] [13 Regression] ICE: RTL check: expected code 'reg', have 'subreg' in rhs_regno, at rtl.h:1932 when building libgcc on riscv64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109092 --- Comment #4 from Jeffrey A. Law --- Also note there's an unsafe REGNO in peephole.md as well. Slightly different in form, but still unprotected and thus for well crafted inputs could probably cause an ICE or incorrect codegen (in a non-checking compiler).
[Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073 --- Comment #19 from Jeffrey A. Law --- I stumbled over this as well as some point. One thing I started playing with, but had to set aside was making vect_get_range_info smarter. In particular the case I was looking at VAR would have a single use that was a narrowing conversion. Taking advantage of that narrowing conversion would tend to allow us to use VxQI and VxHI shifts more often. It's just something we noticed, but never chased down if it was important in terms of real world code generation. I see two patches in my stash. No idea the state on either one, but they might point you at something useful... diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c index 803de3fc287..43369eb8f4e 100644 --- a/gcc/tree-vect-patterns.c +++ b/gcc/tree-vect-patterns.c @@ -58,6 +58,37 @@ vect_get_range_info (tree var, wide_int *min_value, wide_int *max_value) value_range_kind vr_type = get_range_info (var, min_value, max_value); wide_int nonzero = get_nonzero_bits (var); signop sgn = TYPE_SIGN (TREE_TYPE (var)); + + /* If VAR has a single use in a narrowing conversion, then we may be + able to use the narrowing conversion to get a tighter range. */ + gimple *use_stmt; + use_operand_p use; + if (vr_type == VR_VARYING + && single_imm_use (var, &use, &use_stmt) + && is_gimple_assign (use_stmt) + && gimple_assign_rhs_code (use_stmt) == NOP_EXPR) +{ + /* We know VAR has a single use that is a conversion. Now check +if it is a narrowing conversion. */ + tree lhs = gimple_assign_lhs (use_stmt); + unsigned HOST_WIDE_INT orig_size = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))); + unsigned HOST_WIDE_INT lhs_size = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (lhs))); + + if (lhs_size < orig_size) + { + /* The single use of VAR was a narrowing conversion. +Use the nonzero bits from the narrower type and +the min/max values of VAR's type. + +This allows the intersect call below to work in the expected way. */ + nonzero = get_nonzero_bits (lhs); + sgn = TYPE_SIGN (TREE_TYPE (lhs)); + *min_value = wi::to_wide (vrp_val_min (TREE_TYPE (lhs))); + *max_value = wi::to_wide (vrp_val_min (TREE_TYPE (lhs))); + vr_type = VR_RANGE; + } +} + if (intersect_range_with_nonzero_bits (vr_type, min_value, max_value, nonzero, sgn) == VR_RANGE) { And another variant: @@ -74,6 +74,38 @@ vect_get_range_info (tree var, wide_int *min_value, wide_int *max_value) } else { + /* Try a bit harder to get a narrowed range. If VAR has a single use that +is a conversion, see if we can use the converted range. */ + gimple *stmt; + use_operand_p use; + if (single_imm_use (var, &use, &stmt) + && is_gimple_assign (stmt) + && gimple_assign_rhs_code (stmt) == NOP_EXPR) + { + /* If this is a narrowing conversion, then we win as it +narrows the range of VAR. */ + tree lhs = gimple_assign_lhs (stmt); + unsigned HOST_WIDE_INT orig_size = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))); + unsigned HOST_WIDE_INT lhs_size = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (lhs))); + if (lhs_size < orig_size) + { + *min_value = wi::to_wide (TYPE_MIN_VALUE (TREE_TYPE (lhs))); + *max_value = wi::to_wide (TYPE_MAX_VALUE (TREE_TYPE (lhs))); + if (dump_enabled_p ()) + { + dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var); + dump_printf (MSG_NOTE, " has range ["); + dump_hex (MSG_NOTE, *min_value); + dump_printf (MSG_NOTE, ", "); + dump_hex (MSG_NOTE, *max_value); + dump_printf (MSG_NOTE, "]\n"); + } + return true; + } + + + } + if (dump_enabled_p ()) { dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
[Bug tree-optimization/101895] [11 Regression] SLP Vectorizer change pushes VEC_PERM_EXPR into bad location spoiling further optimization opportunities
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101895 --- Comment #9 from Jeffrey A. Law --- Yea, no need to backport this.
[Bug tree-optimization/101895] [11 Regression] SLP Vectorizer change pushes VEC_PERM_EXPR into bad location spoiling further optimization opportunities
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101895 --- Comment #10 from Jeffrey A. Law --- And just an FYI. As expected this resolves the regression on our internal target. Thanks Roger!
[Bug rtl-optimization/104154] [12 Regression] Another ICE due to recent ifcvt changes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104154 --- Comment #12 from Jeffrey A. Law --- Just to confirm. Yes, that patch fixed the problem.
[Bug tree-optimization/104987] New: [12 Regression] Recent change causing vrp13.c regressions on several targets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104987 Bug ID: 104987 Summary: [12 Regression] Recent change causing vrp13.c regressions on several targets Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: law at gcc dot gnu.org Target Milestone: --- This change: commit 8db155ddf8cec9e31f0a4b8d80cc67db2c7a26f9 (refs/bisect/bad) Author: Andrew MacLeod Date: Thu Mar 17 10:52:10 2022 -0400 Always use dominators in the cache when available. This patch adjusts range_from_dom to follow the dominator tree through the cache until value is found, then apply any outgoing ranges encountered along the way. This reduces the amount of cache storage required. PR tree-optimization/102943 * gimple-range-cache.cc (ranger_cache::range_from_dom): Find range via dominators and apply intermediary outgoing edge ranges. Is causing gcc.dg/tree-ssa/vrp13.c to fail on a couple targets (iq2000-elf, v850e-elf). It looks like we're mis-compiling foo_mult. Here's the reduced testcase: /* { dg-do run } */ /* { dg-options -O2 } */ extern void abort (void); extern void link_error (void); int foo_mult (int i, int j) { int k; /* [-20, -10] * [2, 10] should give [-200, -20]. */ if (i >= -20) if (i <= -10) if (j >= 2) if (j <= 10) { k = i * j; if (k < -200) link_error (); if (k > -20) link_error (); return k; } /* [-20, -10] * [-10, -2] should give [20, 200]. */ if (i >= -20) if (i <= -10) if (j >= -10) if (j <= -2) { k = i * j; if (k < 20) link_error (); if (k > 200) link_error (); return k; } /* [-20, 10] * [2, 10] should give [-200, 100]. */ if (i >= -20) if (i <= 10) if (j >= 2) if (j <= 10) { k = i * j; if (k < -200) link_error (); if (k > 100) link_error (); return k; } /* [-20, 10] * [-10, -2] should give [-100, 200]. */ if (i >= -20) if (i <= 10) if (j >= -10) if (j <= -2) { k = i * j; if (k < -100) link_error (); if (k > 200) link_error (); return k; } /* [10, 20] * [2, 10] should give [20, 200]. */ if (i >= 10) if (i <= 20) if (j >= 2) if (j <= 10) { k = i * j; if (k < 20) link_error (); if (k > 200) link_error (); return k; } /* [10, 20] * [-10, -2] should give [-200, -20]. */ if (i >= 10) if (i <= 20) if (j >= -10) if (j <= -2) { k = i * j; if (k < -200) link_error (); if (k > -20) link_error (); return k; } abort (); } int main() { if (foo_mult (10, -2) != -20) abort (); return 0; } The symptom on the v850 is we get the sign wrong on the multiplication. I haven't looked into what goes wrong on iq2000-elf.
[Bug libgcc/86224] [m68k] textrels in libgcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86224 Jeffrey A. Law changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED CC||law at gcc dot gnu.org --- Comment #3 from Jeffrey A. Law --- Fixed on the trunk.
[Bug tree-optimization/104987] [12 Regression] Recent change causing vrp13.c regressions on several targets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104987 --- Comment #2 from Jeffrey A. Law --- It does trigger execution failures on those targets. I guess it's possible it's merely exposing existing bugs on those targets. If we were inlining before, we may have collapsed the test away completely. Let me do some poking over here.
[Bug tree-optimization/104987] [12 Regression] Recent change causing vrp13.c regressions on several targets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104987 --- Comment #6 from Jeffrey A. Law --- For the v850 at least, I'm starting to think this is a simulator bug. In particular the simulator code doesn't look safe on a 64bit host for a signed input to the MUL instruction.
[Bug tree-optimization/104987] [12 Regression] Recent change causing vrp13.c regressions on several targets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104987 --- Comment #7 from Jeffrey A. Law --- Highly confident this is a simulator bug for the v850. I hiaven't looked at iq2000-elf yet, but I wouldn't be surprised if that turns out to be something similar.
[Bug tree-optimization/104987] [12 Regression] Recent change causing vrp13.c regressions on several targets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104987 Jeffrey A. Law changed: What|Removed |Added Target|iq2000-elf, v850e-elf |iq2000-elf Priority|P3 |P4 --- Comment #8 from Jeffrey A. Law --- V850 simulator fix has been posted to the binutils list. I've never really hacked the iq2000, but from the looks of things I think it's mis-compiling mulsi3 in libgcc. In particular, I don't think it's handling delay slots properly for the bbi instruction. reorg has tagged it as a nullified-if-false branch, but it appears that we're using the wrong form at assembly time and the instruction in the delay slot always executes. So P4 as this appears to be an iq2000 specific issue.
[Bug target/104987] [12 Regression] Recent change causing vrp13.c regressions on several targets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104987 Jeffrey A. Law changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #10 from Jeffrey A. Law --- Fixed on the trunk.
[Bug tree-optimization/91645] Missed optimization with sqrt(x*x)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91645 --- Comment #8 from Jeffrey A. Law --- First, there's magic bits which turn a standard sqrt call into something like if (exceptional condition) call libm's sqrt else use hardware sqrt The primary goal is to get errno set properly for those exceptional conditions. That code (tree-ssa-math-opts.cc?) can be updated to query the range to determine if the exceptional conditions can happen. Essentially that eliminates the need for -fno-math-errno. That may be enough for targets with real sqrt instructions. More work is likely needed for targets that use an estimator + correction steps.
[Bug tree-optimization/107114] [13 Regression] Failure to discover range results in bogus warning
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107114 Jeffrey A. Law changed: What|Removed |Added Priority|P3 |P4 CC||law at gcc dot gnu.org
[Bug bootstrap/107119] Bootstrap ICE on 32-bit ARM after r13-2871-g1b74b5cb4e9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107119 Jeffrey A. Law changed: What|Removed |Added CC||law at gcc dot gnu.org --- Comment #2 from Jeffrey A. Law --- Does it still happen after this: Author: Jeff Law Date: Tue Sep 27 01:44:38 2022 -0400 Fix ICEs due to recent jump-to-return optimization gcc/ * cfgrtl.cc (fixup_reorder_chain): Verify that simple_return and return are available before trying to use them.
[Bug tree-optimization/107114] [13 Regression] Failure to discover range results in bogus warning
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107114 --- Comment #2 from Jeffrey A. Law --- Which is just uber-weird. The change in question removes a little subloop which becomes unreachable. Why that would cause us to be unable to analyze the remaining key loop for the IV's range is a complete mystery. Though I guess I'll have to sit down and debug that a bit. VRP is just calling into the loop optimizer to to the IV analysis, right? WRT the new blocks -- I strongly suspect they're part of normalization of the loop and putting it into LCSSA form. I'm not terribly worried about them. Typically they're just going to be creating empty loop latches.
[Bug tree-optimization/107114] [13 Regression] Failure to discover range results in bogus warning
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107114 --- Comment #4 from Jeffrey A. Law --- I'll double check, but IIRC we throw away the loop structures at the end of DOM and they're supposed to be rebuilt (which appears to be happening as we re-construct LCSSA).
[Bug rtl-optimization/107182] [13 Regression] Commit r13-2871-g1b74b5cb4e9d7191f298245063a8f9c3a1bbeff4 breaks profiledbootstrap
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107182 Jeffrey A. Law changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2022-10-07 --- Comment #2 from Jeffrey A. Law --- I've managed to reproduce this failure.
[Bug rtl-optimization/107182] [13 Regression] Commit r13-2871-g1b74b5cb4e9d7191f298245063a8f9c3a1bbeff4 breaks profiledbootstrap
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107182 --- Comment #3 from Jeffrey A. Law --- Testing a trivial patch now.
[Bug rtl-optimization/107182] [13 Regression] Commit r13-2871-g1b74b5cb4e9d7191f298245063a8f9c3a1bbeff4 breaks profiledbootstrap
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107182 Jeffrey A. Law changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #5 from Jeffrey A. Law --- Should be fixed now.
[Bug tree-optimization/107229] [13 Regression] ICE at -O1 and -Os with "-ftree-vectorize": verify_gimple failed since r13-3219-g25413fdb2ac24933
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107229 Jeffrey A. Law changed: What|Removed |Added CC||law at gcc dot gnu.org --- Comment #3 from Jeffrey A. Law --- Note that we're seeing linux kernel build failures that may have the same underlying problem: ./cc1 -O2 -mabi=64 main.i -quiet -I./ ../drivers/base/power/main.c: In function ‘__device_suspend’: ../drivers/base/power/main.c:1606:12: error: invalid ‘PHI’ argument _142 during GIMPLE pass: ifcvt ../drivers/base/power/main.c:1606:12: internal compiler error: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in useless_type_conversion_p, at gimple-expr.cc:87 0x176e2f4 tree_class_check_failed(tree_node const*, tree_code_class, char const*, int, char const*) /home/jlaw/test/gcc/gcc/tree.cc:8877 0xa058bb tree_class_check(tree_node*, tree_code_class, char const*, int, char const*) /home/jlaw/test/gcc/gcc/tree.h:3649 0xe9369a useless_type_conversion_p(tree_node*, tree_node*) /home/jlaw/test/gcc/gcc/gimple-expr.cc:87 0x13a4bd3 verify_gimple_phi /home/jlaw/test/gcc/gcc/tree-cfg.cc:5201 0x13a57dd verify_gimple_in_cfg(function*, bool) /home/jlaw/test/gcc/gcc/tree-cfg.cc:5530 I'll monitor this bug and re-test the kernel build once Andre has a potential patch. If it turns out to be a distinct problem, then I'll open a new bug.
[Bug tree-optimization/107275] New: [13 Regression] Recent ifcvt changes resulting in references to SSA_NAME on free list
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107275 Bug ID: 107275 Summary: [13 Regression] Recent ifcvt changes resulting in references to SSA_NAME on free list Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: law at gcc dot gnu.org Target Milestone: --- This change: commit 25413fdb2ac24933214123e24ba165026452a6f2 (HEAD, refs/bisect/bad) Author: Andre Vieira Date: Tue Oct 11 10:49:27 2022 +0100 vect: Teach vectorizer how to handle bitfield accesses Is resulting in a PHI node with a dangling SSA_NAME. This in turn causes the compiler to abort when compiling the linux kernel on mips64-linux-gnu. If I put a breakpoint in exit, then go up the call chain enough we'll see that we're calling verify_gimple_phi for this PHI: #8 0x013a4bd4 in verify_gimple_phi (phi=0x777e5f00) at /home/jlaw/test/gcc/gcc/tree-cfg.cc:5201 5201 if (!useless_type_conversion_p (TREE_TYPE (phi_result), TREE_TYPE (t))) (gdb) p debug_gimple_stmt (phi) .MEM_12 = PHI <_7(6), .MEM_4(D)(10)> Note the _7. And if we look at the actual underlying node: gdb) p debug_tree (t) nothrow def_stmt version:7 in-free-list> If I put a breakpoint in ifcvt and look at the key block (#3) it looks like this: ;; basic block 3, loop depth 1 ;;pred: 6 ;;5 # link_10 = PHI # .MEM_12 = PHI <.MEM_7(6), .MEM_4(D)(5)> # .MEM_6 = VDEF <.MEM_12> link_10->direct_complete = 0; # .MEM_7 = VDEF <.MEM_6> __asm__ __volatile__("" : : : "memory"); # VUSE <.MEM_7> link_8 = link_10->c_node.next; if (dev_5(D) != link_8) goto ; [89.00%] else goto ; [11.00%] ;;succ: 6 ;;4 ; Quite sensible. In particular note the reference to MEM_7 in the PHI. But when ifcvt is done: ;; basic block 3, loop depth 1 ;;pred: 6 ;;10 # link_10 = PHI # .MEM_12 = PHI <_7(6), .MEM_4(D)(10)> # VUSE <.MEM_12> _ifc__16 = link_10->D.1530; _ifc__17 = BIT_INSERT_EXPR <_ifc__16, 0, 0 (1 bits)>; # VUSE <_7> link_8 = link_10->c_node.next; if (dev_5(D) != link_8) goto ; [89.00%] else goto ; [11.00%] ;;succ: 6 ;;4 Now it's in the free list, but there's still two references in bb3. Se seemed to have removed the volatile asm which defined _7. I haven't dug any further than that. Testcase: struct list_head { struct list_head *next; }; struct device { struct list_head suppliers; }; struct device_link { struct list_head c_node; int direct_complete:1; }; void dpm_clear_superiors_direct_complete (struct device *dev) { struct device_link *link; for (; &link->c_node != (&dev->suppliers);) { link->direct_complete = 0; __asm__ __volatile__ ("":::"memory"); link = link->c_node.next; } } Compile with -O2 -mabi=64 -I./ on mips64-linux-gnu.
[Bug target/101697] [11/12/13 regression] ICE compiling uClibc-ng for h8300-linux
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101697 --- Comment #6 from Jeffrey A. Law --- So this issue has come up again in the context of LRA conversion which happens to trip over the same bug, but with a different testcase. At the core of this problem is reload and LRA will both generate invalid RTL when performing register eliminations. Specifically, they will create an autoinc addressing mode where the incremented/decremented register is used elsewhere in the same insn as a source operand. Such RTL has been considered invalid as long as I can remember. The H8 backend does try to prevent this behavior by checking for this scenario and rejecting such insns in the insn condition. But in both the reload and LRA cases, they make substitutions in the original insn without validating the resulting insn (which would have failed). Even if they did try to validate the resulting insn, neither has a code generation strategy to deal with a failed substitution during register eliminations. Paul K. indicated how the pdp11 port handles these cases with constraints. Using constraints alone was insufficient to fix this problem, but using constraints in conjunction with the existing insn condition checks does seem to fix this problem. I'm currently upstreaming the various bits to make that happen.
[Bug target/101697] [11/12 regression] ICE compiling uClibc-ng for h8300-linux
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101697 Jeffrey A. Law changed: What|Removed |Added Summary|[11/12/13 regression] ICE |[11/12 regression] ICE |compiling uClibc-ng for |compiling uClibc-ng for |h8300-linux |h8300-linux --- Comment #8 from Jeffrey A. Law --- Fixed for gcc-13. No plans to backport.
[Bug tree-optimization/107275] [13 Regression] Recent ifcvt changes resulting in references to SSA_NAME on free list
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107275 Jeffrey A. Law changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #5 from Jeffrey A. Law --- Fixed by Andre's patch on the trunk.
[Bug other/107353] [13 regression] Numerous ICEs after 13-3416-g1d561e1851c466
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107353 Jeffrey A. Law changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2022-10-21 CC||law at gcc dot gnu.org --- Comment #2 from Jeffrey A. Law --- Note this is also failing on embedded targets. For example nds32le-elf is failing emutls-3.c with an ICE.
[Bug target/103722] [12 Regression] ICE in extract_constrain_insn building glibc for SH4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103722 Jeffrey A. Law changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2022-01-05 CC||law at gcc dot gnu.org --- Comment #2 from Jeffrey A. Law --- I'll confirm. My tester is seeing the same failure building the linux kernel on sh4/sh4eb and was bisected to the same commit. Isn't a move cost 2 special to the old reload pass, causing it to avoid various checks on simple move insns? If so, wouldn't returning any other value be helpful, and something closer to 2 than 7 would perturb the generated code less? Regardless, I'd approve the patch as-is if you submit it.
[Bug bootstrap/103974] [12 Regression] ICE in ira_flattening building libstdc++ with r12-6415-g01f3e6a40e72
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103974 Jeffrey A. Law changed: What|Removed |Added CC||law at gcc dot gnu.org --- Comment #6 from Jeffrey A. Law --- Richard -- if you need an alternate testcase, reach out -- I've seen what is likely the same failure on lm32-elf building libgcc. I have the un-reduced .i if it'd be useful.
[Bug bootstrap/103974] [12 Regression] ICE in ira_flattening building libstdc++ with r12-6415-g01f3e6a40e72
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103974 --- Comment #8 from Jeffrey A. Law --- ACK. I wandered through the tester this morning, the vast majority of the current failures are the ira_flattening ICE. Though I think there's likely one other ICE in IRA (frv-elf, ICE in check_allocation). I'll restart everything once you've got your patch ready and file fresh bugs for anything that's still problematical after the ira_flattening ICE is resolved.
[Bug tree-optimization/103977] [12 Regression] ice in try_vectorize_loop_1 since r12-6420-gd3ff7420e941931d32ce2e332e7968fe67ba20af
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103977 Jeffrey A. Law changed: What|Removed |Added CC||law at gcc dot gnu.org --- Comment #4 from Jeffrey A. Law --- FWIW, the patch Jakub identified is causing similar testsuite regressions across ~30 targets at this point.
[Bug tree-optimization/103977] [12 Regression] ice in try_vectorize_loop_1 since r12-6420-gd3ff7420e941931d32ce2e332e7968fe67ba20af
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103977 --- Comment #6 from Jeffrey A. Law --- And just to follow-up. With the patch that was committed to the trunk, the 30+ targets that were previously failing are now working. A few are still building, but I expect them to succeed. mips* is failing, but I suspect it's the recent allocator changes, not the vectorizer changes causing problems.
[Bug c++/78655] gcc doesn't exploit the fact that the result of pointer addition can not be nullptr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78655 --- Comment #11 from Jeffrey A. Law --- We can assume that the result of a POINTER_PLUS is non-null if either argument is non-null. So X + constant is always non-null. X + Y would be non-null if either X or Y is known to be non-null. If we know that X + Y is non-null via some mechanism (for example the result was dereferenced), then we know that X and Y are non-null.
[Bug middle-end/104067] [12 Regression] wrong code compiling QEMU since r12-4790-g4b3a325f07acebf4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104067 --- Comment #5 from Jeffrey A. Law --- I briefly looked at the other BZ last week, but didn't make much headway. The first thing that stood out was why are we threading around the loop. I thought that was disabled. Anyway, Aldy and/or I will take both of these in the coming days.
[Bug tree-optimization/103721] [12 regression] wrong code generated for loop with conditional since r12-4790-g4b3a325f07acebf4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103721 Jeffrey A. Law changed: What|Removed |Added CC||law at gcc dot gnu.org --- Comment #4 from Jeffrey A. Law --- But what doesn't make any sense here is the folding in this block: [local count: 1073741824]: # searchVolume_5 = PHI # currentVolume_6 = PHI _2 = searchVolume_5 != currentVolume_6; _3 = searchVolume_5 != 0; _4 = _2 & _3; if (_4 != 0) goto ; [89.00%] else goto ; [11.00%] In fold_using_range::range_of_range_op we have: (gdb) p debug_tree (op1) unit-size align:32 warn_if_not_align:0 symtab:0 alias-set 1 canonical-type 0x7780e5e8 precision:32 min max pointer_to_this > visited var def_stmt searchVolume_5 = PHI version:5> $113 = void (gdb) p debug_tree (op2) unit-size align:32 warn_if_not_align:0 symtab:0 alias-set 1 canonical-type 0x7780e5e8 precision:32 min max pointer_to_this > visited var def_stmt currentVolume_6 = PHI version:6> $114 = void (gdb) p rel $115 = EQ_EXPR If I'm reading the code correctly I think that means that the ranger has determined that _5 and _6 are equal. But I don't see how it can possibly make that determination with this CFG: int f (int world) { int currentVolume; int searchVolume; int ipos.0_1; _Bool _2; _Bool _3; _Bool _4; ;; basic block 2, loop depth 0 ;;pred: ENTRY goto ; [100.00%] ;;succ: 9 ;; basic block 3, loop depth 1 ;;pred: 9 ipos.0_1 = ipos; if (ipos.0_1 != 0) goto ; [50.00%] else goto ; [50.00%] ;;succ: 8 ;;4 ;; basic block 4, loop depth 1 ;;pred: 3 ;;succ: 8 ;; basic block 8, loop depth 1 ;;pred: 4 ;;3 # searchVolume_11 = PHI <1(4), 0(3)> # currentVolume_8 = PHI ;;succ: 9 ;; basic block 9, loop depth 1 ;;pred: 8 ;;2 # searchVolume_5 = PHI # currentVolume_6 = PHI _2 = searchVolume_5 != currentVolume_6; _3 = searchVolume_5 != 0; _4 = _2 & _3; if (_4 != 0) goto ; [89.00%] else goto ; [11.00%] ;;succ: 3 ;;7 ;; basic block 7, loop depth 0 ;;pred: 9 return currentVolume_6; ;;succ: EXIT } This feels like it's got to be a problem in the equivalence handling -- it's largely outside the threader. My recollection of equivalences in loops is that they're exceedingly hard to get correct once you follow the backedge -- particularly since you have to invalidate some equivalences once you traverse that backedge. Finding the set that needed to be invalidated was expensive and the book keeping turned out to be too hard to do reliably so I ripped it all out. How does equivalence handling in the Ranger world work once you traverse the backedge of a loop?
[Bug tree-optimization/103388] [12 Regression] missed optimization for dead code elimination at -O3 (trunk vs 11.2.0)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103388 --- Comment #5 from Jeffrey A. Law --- We thread one edge at a time, so we don't know ahead of time how many copies there would be. It could be restructured to go ahead and register these threads, then compute the copy cost on a more global basis. That would allow us to bump up the threshold to register the thread, but still reject things later if the cost appears to be too high. The book keeping necessary to do that would actually be step #0 for the real solution which would be to fix the new copier to coalesce cases where multiple incoming edges thread to the same outgoing edge in a manner similar to what tree-ssa-threadupdate does.
[Bug tree-optimization/103721] [12 regression] wrong code generated for loop with conditional since r12-4790-g4b3a325f07acebf4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103721 --- Comment #9 from Jeffrey A. Law --- I think Andrew has raised a really interesting issue. If the relation code is designed around seeing things in dominator order, then don't we have to stop using it once we traverse any edge where the edge source does not dominate the edge destination (assume this is a partial graph rather than a multi-entry function ;-) 1 2 3 | \ / |4 | / \ +->5 6 / \ 7 8 Note how BB4 does not dominate BB5. If we try to thread something like 2->4->5->?, then can't we run into problems with the equivalence handling as well, even though we're not dealing with a loop?
[Bug rtl-optimization/104153] New: [12 Regression] ICE due to recent ifcvt changes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104153 Bug ID: 104153 Summary: [12 Regression] ICE due to recent ifcvt changes Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: law at gcc dot gnu.org Target Milestone: --- Created attachment 52249 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52249&action=edit Testcase or1k-elf has started regressing building newlib after this patch: commit aa8cfe785953a0e87d2472311e1260cd98c605c0 (HEAD) Author: Robin Dapp Date: Wed Jan 19 17:36:36 2022 +0100 ifcvt: Try re-using CC for conditional moves. Following up on the previous patch, this patch makes noce_convert_multiple emit two cmov sequences: The same one as before and a second one that tries to re-use the existing CC. Then their costs are compared and the cheaper one is selected. gcc/ChangeLog: * ifcvt.cc (cond_exec_get_condition): New parameter to allow getting the reversed comparison. (try_emit_cmove_seq): New function to facilitate creating a cmov sequence. (noce_convert_multiple_sets): Create two sequences and use the less expensive one. It's faulting a bit later in the RTL pipeline, but I think what's going on is we're modifying an insn in-place and don't update the DF information leading to a DF verification failure later. I'd bet if we did a full DF verify after ifcvt we'd see the failure earlier. Compile the attached code with -O2 on an or1k-elf cross-compiler to get: dump file: j.c.276r.cprop3 ../../../../../../..//newlib-cygwin/newlib/libm/math/s_floor.c: In function ‘floor’: ../../../../../../..//newlib-cygwin/newlib/libm/math/s_floor.c:121:1: internal compiler error: in df_refs_verify, at df-scan.cc:4003 0xca95c6 df_refs_verify /home/jlaw/test/gcc/gcc/df-scan.cc:4003 0xca9836 df_insn_refs_verify /home/jlaw/test/gcc/gcc/df-scan.cc:4086 0xca99d7 df_bb_verify /home/jlaw/test/gcc/gcc/df-scan.cc:4119 0xca9fa5 df_scan_verify() /home/jlaw/test/gcc/gcc/df-scan.cc:4240 0xc94512 df_verify() /home/jlaw/test/gcc/gcc/df-core.cc:1818 0xc92e0c df_analyze_1 /home/jlaw/test/gcc/gcc/df-core.cc:1214 0xc931d0 df_analyze() /home/jlaw/test/gcc/gcc/df-core.cc:1290 0x1dc4925 execute_rtl_cprop /home/jlaw/test/gcc/gcc/cprop.cc:1925 0x1dc4a24 execute /home/jlaw/test/gcc/gcc/cprop.cc:1964
[Bug rtl-optimization/104154] New: [12 Regression] Another ICE due to recent ifcvt changes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104154 Bug ID: 104154 Summary: [12 Regression] Another ICE due to recent ifcvt changes Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: law at gcc dot gnu.org Target Milestone: --- Created attachment 52251 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52251&action=edit Testcase I didn't bisect this to the exact change, but I'm highly confident this is another issue with the recent ifcvt work. On arc-elf, compile the attached testcase with -O2 to trigger an assert in the arc backend: during RTL pass: ce1 ../../../../../../..//newlib-cygwin/newlib/libc/stdlib/mprec.c: In function ‘__multiply’: ../../../../../../..//newlib-cygwin/newlib/libc/stdlib/mprec.c:416:1: internal compiler error: in gen_compare_reg, at config/arc/arc.cc:2259 0x17e7fe6 gen_compare_reg(rtx_def*, machine_mode) /home/jlaw/test/gcc/gcc/config/arc/arc.cc:2259 0x1ded7ab gen_movsicc(rtx_def*, rtx_def*, rtx_def*, rtx_def*) /home/jlaw/test/gcc/gcc/config/arc/arc.md:1621 0x1169216 rtx_insn* insn_gen_fn::operator()(rtx_def*, rtx_def*, rtx_def*, rtx_def*) const /home/jlaw/test/gcc/gcc/recog.h:407 0x1168b00 maybe_gen_insn(insn_code, unsigned int, expand_operand*) /home/jlaw/test/gcc/gcc/optabs.cc:7962 0x1168df6 maybe_expand_insn(insn_code, unsigned int, expand_operand*) /home/jlaw/test/gcc/gcc/optabs.cc:7993 0x1160699 emit_conditional_move_1 /home/jlaw/test/gcc/gcc/optabs.cc:5030 0x1160475 emit_conditional_move(rtx_def*, rtx_def*, rtx_def*, rtx_def*, rtx_def*, machine_mode) /home/jlaw/test/gcc/gcc/optabs.cc:4980 0x1f7ec9f noce_emit_cmove /home/jlaw/test/gcc/gcc/ifcvt.cc:1753 0x1f82910 try_emit_cmove_seq /home/jlaw/test/gcc/gcc/ifcvt.cc:3179 0x1f82ec1 noce_convert_multiple_sets /home/jlaw/test/gcc/gcc/ifcvt.cc:3396 0x1f83682 noce_process_if_block /home/jlaw/test/gcc/gcc/ifcvt.cc:3616 [ ... ] At first glance this appears to be a problem with feeding an existing comparison back through the target bits. The arc backend simply isn't prepared to deal with that. But it may be a higher level issue. Anyway, it's clearly a regression relative to gcc-12 as the arc port will no longer build newlib.
[Bug rtl-optimization/104153] [12 Regression] ICE due to recent ifcvt changes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104153 --- Comment #2 from Jeffrey A. Law --- I'd bet the or1k expanders are changing the passed-in RTL.
[Bug target/104028] M68k: Error: value -16034 out of range for switch tables in some cases with optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104028 Jeffrey A. Law changed: What|Removed |Added Priority|P3 |P4 CC||law at gcc dot gnu.org
[Bug middle-end/103483] [12 regression] context-sensitive ranges change triggers stringop-overread
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103483 Jeffrey A. Law changed: What|Removed |Added CC||law at gcc dot gnu.org --- Comment #19 from Jeffrey A. Law --- Just threading doesn't create nonsensical blocks out of thin air. It may expose nonsensical code that already existed though. That's inherent in the path isolating nature of the transformation. But that's not what's going on in the example you posted. What's going on there is we have a memcpy where we know the source has only 3 bytes of storage, but we *may* pass in a length of 4 for the memcpy as "mystrlen" is opaque.. That is precisely the kind of scenario where these warnings are supposed to trigger. I would strongly disagree with the recommendation not to warn because of information deduced from conditionals on the path. That would cripple the warnings in precisely the case where they're the most valuable IMHO.
[Bug rtl-optimization/104153] [12 Regression] ICE due to recent ifcvt changes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104153 --- Comment #4 from Jeffrey A. Law --- That seems to get newlib building. But I'm also seeing a ton of testsuite failures. Not sure what's going on with those yet. I'll have to make some time tomorrow to look at them a bit deeper.
[Bug middle-end/103483] [12 regression] context-sensitive ranges change triggers stringop-overread
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103483 --- Comment #21 from Jeffrey A. Law --- Yes, the wording is dreadful. Yes we need a better way to express to the user the paths followed and how they impacted the analysis. As for suppressing. There's not a great option here, which isn't a huge surprise. In this specific case we'd need to be able to make mystrlen less opaque, particularly WRT its return value. Even if we had a solution to do that, it's still far from good IMHO -- you end up with annotations all over the place.
[Bug tree-optimization/95424] Failure to optimize division with numerator of 1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95424 Jeffrey A. Law changed: What|Removed |Added CC||law at gcc dot gnu.org Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #8 from Jeffrey A. Law --- Fixed on the trunk.
[Bug middle-end/19987] [meta-bug] fold missing optimizations in general
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19987 Bug 19987 depends on bug 95424, which changed state. Bug 95424 Summary: Failure to optimize division with numerator of 1 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95424 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug testsuite/70230] 11 test regressions when building GCC 6 with --enable-default-ssp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70230 Jeffrey A. Law changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED CC||law at gcc dot gnu.org --- Comment #7 from Jeffrey A. Law --- Fixed on the trunk.
[Bug rtl-optimization/104153] [12 Regression] ICE due to recent ifcvt changes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104153 --- Comment #7 from Jeffrey A. Law --- Sorry, I wasn't able to debug those additional failures until the weekend. They ultimately turned out to be a bug in some recent newlib refactoring. I got clean or1k builds once I applied your patch and fixed up newlib. Are you going to submit the patch officially?
[Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771 --- Comment #34 from Jeffrey A. Law --- I've always wanted to see us be able to push something like those matched conversions down through the PHI. That would make the code look like: if (x.1_1 > 255) goto ; [INV] else goto ; [INV] : _2 = -x_5(D); _3 = _2 >> 31; goto ; [INV] : : # tmp = PHI <_3, x_5(4)> iftmp.0_4 = (uint8_t) tmp And presumably we'd clean up the empty bb4 which would in turn unblock other optimizations. Is that what you're working on?
[Bug rtl-optimization/104400] New: [12 Regression] v850e lra/reload failure after recent change
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104400 Bug ID: 104400 Summary: [12 Regression] v850e lra/reload failure after recent change Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: law at gcc dot gnu.org Target Milestone: --- After this change: commit 85419ac59724b7ce710ebb4acf03dbd747edeea3 (HEAD, refs/bisect/bad) Author: Vladimir N. Makarov Date: Fri Jan 21 13:34:32 2022 -0500 [PR103676] LRA: Calculate and exclude some start hard registers for reload pseudos LRA and old reload pass uses only one register class for reload pseudos even if operand constraints contain more one register class. Let us consider constraint 'lh' for thumb arm which means low and high thumb registers. Reload pseudo for such constraint will have general reg class (union of low and high reg classes). Assigning the last low register to the reload pseudo is wrong if the pseudo is of DImode as it requires two hard regs. But it is considered OK if we use general reg class. The following patch solves this problem for LRA. gcc/ChangeLog: PR target/103676 [ ... ] The v850e-elf port will no longer build newlib due to a spill failure. I've narrowed the test down, but haven't done any debugging to see if this is really an LRA issue or a backend issue. Compile with -O2 -mv850e3v5 to trigger: ./cc1 -O2 -mv850e3v5 j.c frob Analyzing compilation unit Performing interprocedural optimizations <*free_lang_data> {heap 1200k} {heap 1200k} {heap 1200k} {heap 1200k} {heap 1616k} {heap 1616k} {heap 1616k} {heap 1616k}Streaming LTO {heap 1616k} {heap 1616k} {heap 1616k} {heap 1616k} {heap 1616k} {heap 1616k} {heap 1616k} {heap 1616k} {heap 1616k} {heap 1616k} {heap 1616k} {heap 1616k} {heap 1616k} {heap 1616k}Assembling functions: frob j.c: In function 'frob': j.c:7:1: error: unable to find a register to spill 7 | } | ^ j.c:7:1: error: this is the insn: (insn 22 26 25 2 (set (mem/c:DI (reg/f:SI 34 .fp) [1 %sfp+-8 S8 A32]) (reg:DI 52)) "j.c":4:7 1 {*movdi_internal} (expr_list:REG_DEAD (reg:DI 52) (nil))) during RTL pass: reload j.c:7:1: internal compiler error: in lra_split_hard_reg_for, at lra-assigns.cc:1837 double frob (double r) { r = -r; return r; }
[Bug target/97040] incorrect fused multiply add/subtract instruction generated from C code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97040 Jeffrey A. Law changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |law at gcc dot gnu.org Ever confirmed|0 |1 Last reconfirmed||2022-02-06 --- Comment #1 from Jeffrey A. Law --- I think I know what's going on here...
[Bug rtl-optimization/104154] [12 Regression] Another ICE due to recent ifcvt changes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104154 --- Comment #4 from Jeffrey A. Law --- Given we've run this code on a pretty wide variety of targets, I'm not too concerned. The arc issue was the last one I'm aware of related to your ifcvt changes.
[Bug tree-optimization/24021] VRP does not work with floating points
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=24021 Jeffrey A. Law changed: What|Removed |Added CC||law at gcc dot gnu.org --- Comment #7 from Jeffrey A. Law --- Very cool. ANd no, I'm not enough of an expert on the FP side to shepherd that though. I would expect it to be exceptionally tricky on the solver side. Probably the most useful things I've come across would be knowing if a particular value can or can not have certain special values. ie, [qs]NaN, +-0.0, +-Inf. Knowing how an value relates to 0 is also quite helpful. ie, > 0, < 0 and the like.
[Bug rtl-optimization/104400] [12 Regression] v850e lra/reload failure after recent change
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104400 Jeffrey A. Law changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2022-02-09 Status|UNCONFIRMED |NEW --- Comment #2 from Jeffrey A. Law --- NP on the timing. My biggest concern (as always) is whether or not this is a generic issue or a bug in the v850 target files. The former is obviously much more important. If it starts to look like a target issue, then feel free to punt it to me. While I don't know the v850 fp bits, I have retained a fair amount of generic v850 knowledge over the decades :-)
[Bug target/97040] incorrect fused multiply add/subtract instruction generated from C code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97040 --- Comment #2 from Jeffrey A. Law --- So a bit more background. I was doing some maintenance work on the v850 a few years back and noticed an issue with e3v5 testing and FMAs. I poked around a bit and had largely ruled out the generic bits of FMA generation as the problem. But I was very short of time and didn't have reasonable docs on the implementation of FMAs for the newer versions of the v850 (e3v5 was added long after I did the original port). Renesas's site isn't great for finding the relevant docs as you have to map from something like e3v5 to a part within the v850 family:( SO I set it aside. You report caused me to dig a bit further. I finally found a mapping from e3v5 to a specific part on Renesas's site (RH850G3KH) and was able to review the specification of these instructions. Again I managed to convince myself that our gimple FMA generation was correct. I also convinced myself that the RTL patterns for the v850 match the documentation from Renesas. Then it finally hit me. The v850 patterns are using pattern names that require specific semantics. ie, if the expanders see a pattern fnma they expect that pattern will implement very specific semantics. In particular they expect that the negation will apply to one of the arguments to the multiplication. But the instruction emitted in those patterns negates the final result. ie, GCC expects this semantic from the fnma pattern (fma (neg op0) (op1) (op2)) But the v850 chip implements (neg (fma (op0) (op1) (op2))) So the bug is using the "fnma" and "fnms" names for those patterns. The RTL is fine, it's strictly a pattern naming issue.
[Bug rtl-optimization/104400] [12 Regression] v850e lra/reload failure after recent change
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104400 Jeffrey A. Law changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #5 from Jeffrey A. Law --- And I can confirm newlib builds again. Unfortunately, the last successful build was too far in the past and was wiped by Jenkins. So I can't compare the testsuite results with that prior successful run :(
[Bug target/97040] incorrect fused multiply add/subtract instruction generated from C code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97040 Jeffrey A. Law changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #4 from Jeffrey A. Law --- Fixed on the trunk.
[Bug rtl-optimization/104153] [12 Regression] ICE due to recent ifcvt changes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104153 Jeffrey A. Law changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #11 from Jeffrey A. Law --- Fixed on the trunk.
[Bug rtl-optimization/104154] [12 Regression] Another ICE due to recent ifcvt changes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104154 --- Comment #5 from Jeffrey A. Law --- Created attachment 52432 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52432&action=edit Testcase #2
[Bug rtl-optimization/104154] [12 Regression] Another ICE due to recent ifcvt changes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104154 Jeffrey A. Law changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2022-02-13 --- Comment #6 from Jeffrey A. Law --- So the patch gets the first testcase working, but we fail shortly thereafter in a similar way on another testcase.Compile with -O2 using an arc-elf cross compiler.
[Bug rtl-optimization/104154] [12 Regression] Another ICE due to recent ifcvt changes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104154 --- Comment #8 from Jeffrey A. Law --- So the updated patch fixes the arc build regressions. I haven't looked at the thread with Segher, but I will as soon as I can. Mostly just wanted to let you know that the updated patch does indeed get the port building again.
[Bug tree-optimization/102981] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102981 Jeffrey A. Law changed: What|Removed |Added Priority|P3 |P2 --- Comment #4 from Jeffrey A. Law --- I have no strong opinions about this specific testcase. More generally I am in agreement with Zdenek and others that the threaders should not be peeling iterations off loops or rotating loops. Fundamentally the threaders don't have the kind of costing model to know if peeling an iteration off is profitable or not. So even after the loop optimizers are done, I'd still lean against peeling since if it was profitable it should have been done by the loop optimizer or vectorizer. So unless someone can show this is a significant issue in real world code, I would argue that it ought to be fixed by including the possibility of eliminating unreachable code int he profitibility analysis for loop peeling by the loop optimizers and possibly the unroller (for this specific example).
[Bug tree-optimization/103161] New: [12 Regression] Better ranges cause builtin-sprintf-warn-16.c failure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103161 Bug ID: 103161 Summary: [12 Regression] Better ranges cause builtin-sprintf-warn-16.c failure Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: law at gcc dot gnu.org Target Milestone: --- On a variety of platforms builtin-sprintf-warn-16.c has started failing since converting strlen to use Ranger. Tests that now fail, but worked before (6 tests): or1k-sim: gcc.dg/tree-ssa/builtin-sprintf-warn-16.c (test for warnings, line 142) or1k-sim: gcc.dg/tree-ssa/builtin-sprintf-warn-16.c (test for warnings, line 243) or1k-sim: gcc.dg/tree-ssa/builtin-sprintf-warn-16.c (test for excess errors) The excess errors and line 142 failure are due to getting tighter ranges out of Ranger which seems to have confused the wrap-around bits in the sprintf warnings. A reduced testcase for or1k-elf that you can trigger with a cross: # 0 "k.c" # 0 "" # 0 "" # 1 "k.c" typedef unsigned int size_t; typedef unsigned int wchar_t; void sink (void*); void* get_value (void); # 22 "k.c" extern char buf[1]; typedef signed long long sint128_t; typedef unsigned long long uint128_t; const sint128_t sint128_max = (sint128_t)1 << (sizeof sint128_max * 8 - 2); const sint128_t uint128_max = (uint128_t)-1; void test_width_var (void) { { extern unsigned w; if (w < 5 || (unsigned)-1 - 7 < w) w = 5; __builtin_sprintf (buf + 1, "%*u", w, *(int*)get_value ()); sink (buf); } } The relevant bits from the .strlen dump: Old: ! j.c:52: __builtin_sprintf: objsize = 0, fmtstr = "%*u" ! Directive 1 at offset 0: "%*u", width in range [0, 2147483648] ! Result: 1, 1, 2147483648, 2147483648 (1, 1, 2147483648, 2147483648) Directive 2 at offset 3: "", length = 1 New: ! k.c:52: __builtin_sprintf: objsize = 0, fmtstr = "%*u" ! Directive 1 at offset 0: "%*u", width in range [5, 4294967288] ! Result: 5, 5, -1, 4294967288 (5, 5, -1, -1) Directive 2 at offset 3: "", length = 1 AFAICT we've got a tighter range from Ranger, which in turn is confusing some of the wrap-around logic. Martin, can you take a look a this?
[Bug tree-optimization/103161] [12 Regression] Better ranges cause builtin-sprintf-warn-16.c failure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103161 --- Comment #1 from Jeffrey A. Law --- I suspect the same underlying issue is affecting the test on line #243 as well.
[Bug tree-optimization/103182] New: [12 Regression] Recent change causes code correctness regression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103182 Bug ID: 103182 Summary: [12 Regression] Recent change causes code correctness regression Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: law at gcc dot gnu.org Target Milestone: --- This change: d70ef65692fced7ab72e0aceeff7407e5a34d96d is the first bad commit commit d70ef65692fced7ab72e0aceeff7407e5a34d96d Author: Jan Hubicka Date: Wed Nov 10 13:08:41 2021 +0100 Make EAF flags more regular (and expressive) I hoped that I am done with EAF flags related changes, but while looking into the Fortran testcases I noticed that I have designed them in unnecesarily restricted way. I followed the scheme of NOESCAPE and NODIRECTESCAPE which is however the only property tht is naturally transitive. This patch replaces the existing flags by 9 flags: [ ... ] Is causing gcc.dg/torture/pr45962-2.c to fail on or1k-elf and a few other platforms at -O2. You should be able to reproduce this with just a cross compiler since the .optimized dump after the change above just calls "foo" then aborts. The check for the value of i after the call to foo has been eliminated. There's a bit of dodgy code in foo() from an aliasing standpoint. I haven't looked at it real closely though.
[Bug tree-optimization/103182] [12 Regression] Recent change causes code correctness regression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103182 --- Comment #4 from Jeffrey A. Law --- And just to be clear, Andrew's c#1 is correct. It's 45967-2.c.
[Bug tree-optimization/103226] New: [12 Regression] Recent change to copy-headers causes execution failure for gcc.dg/torture/pr80974
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103226 Bug ID: 103226 Summary: [12 Regression] Recent change to copy-headers causes execution failure for gcc.dg/torture/pr80974 Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: law at gcc dot gnu.org Target Milestone: --- This change: commit e82c382971664d6fd138cc36020db4b1a91885c6 Author: Aldy Hernandez Date: Wed Nov 10 13:21:59 2021 +0100 Allow loop header copying when first iteration condition is known. Causes gcc.dg/torture/pr80974.c to fail with -O2 on bfin-elf I haven't debugged it in any significant way other than bisection to the above change and quickly looking at the before/after dumps. Things diverge at the .ch2 dump (no surprise there). You may ultimately have to set up a cross environment to debug this one. The one other tidbit, before this change, the test finishes relatively fast, afterwards it takes a LOT longer. Maybe we're wrapping a loop or something like that. Dunno. It may ultimately not be relevant. Oh, if you end up doing a binutils, gcc & newlib build and want to test things yourself using dejagnu you might need a suitable baseboards file. Here's mine, which was copied and edited from some other port... # Copyright (C) 1997-2016 Free Software Foundation, Inc. # # This file is part of DejaGnu. # # DejaGnu is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 3 of the License, or # (at your option) any later version. # # DejaGnu is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with DejaGnu; if not, write to the Free Software Foundation, # Inc., 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, USA. # This is a list of toolchains that are supported on this board. set_board_info target_install {bfin-elf} # Load the generic configuration for this board. This will define a basic set # of routines needed by the tool to communicate with the board. load_generic_config "sim" # basic-sim.exp is a basic description for the standard Cygnus simulator. load_base_board_description "basic-sim" # "fr30" is the name of the sim subdir. setup_sim bfin # No multilib options needed by default. process_multilib_options "" # We only support newlib on this target. We assume that all multilib # options have been specified before we get here. set_board_info compiler "[find_gcc]" set_board_info cflags"[libgloss_include_flags] [newlib_include_flags]" set_board_info ldflags "-msim [libgloss_link_flags] [newlib_link_flags]" # No linker script needed. set_board_info ldscript "" # The simulator doesn't return exit statuses and we need to indicate this; # the standard GCC wrapper will work with this target. set_board_info needs_status_wrapper 1 # Doesn't pass arguments or signals, can't return results, and doesn't # do inferiorio. set_board_info noargs 1 set_board_info gdb,nosignals 1 set_board_info gdb,noresults 1 set_board_info gdb,noinferiorio 1 set_board_info gcc,stack_size 4096
[Bug tree-optimization/103235] New: [12 Regression] Recent change to atomics triggers ICE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103235 Bug ID: 103235 Summary: [12 Regression] Recent change to atomics triggers ICE Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: law at gcc dot gnu.org Target Milestone: --- Created attachment 51790 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51790&action=edit Testcase This change: commit fb161782545224f55ba26ba663889c5e6e9a04d1 Author: liuhongt Date: Mon Oct 25 13:59:51 2021 +0800 Improve integer bit test on __atomic_fetch_[or|and]_* returns Is causing various ports (csky, mips*, s390, maybe others) to fail to build glibc: bash-5.1# ./cc1 -O2 /tmp/pthread_cancel.i -quiet pthread_cancel.c: In function '__pthread_cancel': pthread_cancel.c:60:1: error: type mismatch in binary expression int unsigned int int _36 = _4 & 8; during GIMPLE pass: fab pthread_cancel.c:60:1: internal compiler error: verify_gimple failed 0x11a2ef3 verify_gimple_in_cfg(function*, bool) ../../../gcc/gcc/tree-cfg.c:5577 0xff2179 execute_function_todo ../../../gcc/gcc/passes.c:2042 0xff1115 do_per_function ../../../gcc/gcc/passes.c:1687 0xff2369 execute_todo ../../../gcc/gcc/passes.c:2096 Attached is a testcase that ought to trigger with a csky-linux-gnu cross compiler. I haven't done any debugging other than to bisect to the change above.
[Bug tree-optimization/103235] [12 Regression] Recent change to atomics triggers ICE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103235 --- Comment #2 from Jeffrey A. Law --- Can you please double-check? It just reproduced for me. Perhaps you were missing -I./ which is sometimes needed for cross toolchains to *-linux. [jlaw@dl360p gcc]$ ./cc1 -O2 pthread_cancel.i -I./ -quiet -w pthread_cancel.c: In function ‘__pthread_cancel’: pthread_cancel.c:60:1: error: type mismatch in binary expression int unsigned int int _36 = _4 & 8; during GIMPLE pass: fab pthread_cancel.c:60:1: internal compiler error: verify_gimple failed 0x134a788 verify_gimple_in_cfg(function*, bool) /home/jlaw/test/gcc/gcc/tree-cfg.c:5577 0x1198471 execute_function_todo /home/jlaw/test/gcc/gcc/passes.c:2042 0x119740d do_per_function /home/jlaw/test/gcc/gcc/passes.c:1687 0x1198661 execute_todo /home/jlaw/test/gcc/gcc/passes.c:2096 Please submit a full bug report, [jlaw@dl360p gcc]$ pushd /home/jlaw/test/gcc ~/test/gcc ~/test/obj/csky-linux-gnu/obj/gcc/gcc [jlaw@dl360p gcc]$ git status . On branch master Your branch is up to date with 'origin/master'. Untracked files: (use "git add ..." to include in what will be committed) gcc/J gcc/j nothing added to commit but untracked files present (use "git add" to track) [jlaw@dl360p gcc]$ git log -n1 HEAD commit 8a601f9bc45f9faaa91f18d58ba71b141acff701 (HEAD -> master, origin/trunk, origin/master, origin/HEAD) Author: Aldy Hernandez Date: Sun Nov 14 16:17:36 2021 +0100 Remove gcc.dg/pr103229.c
[Bug tree-optimization/103226] [12 Regression] Recent change to copy-headers causes execution failure for gcc.dg/torture/pr80974
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103226 --- Comment #3 from Jeffrey A. Law --- Agreed on P1 until we understand. If it's target specific P4 seems appropriate. I don't see this failure on any other target in the tester.
[Bug tree-optimization/103192] [12 Regression] ICE on libgomp target-in-reduction-2.{C,c}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103192 Jeffrey A. Law changed: What|Removed |Added CC||law at gcc dot gnu.org --- Comment #16 from Jeffrey A. Law --- Based in c#15 there must be an equivalence between outer_ctx_1389 and iftmp.2373_1515 that dominates the condition. The equivalence doesn't have to be a global equivalence though. I'd start by chasing that down to see if it makes sense. I'm going to hazard a guess it's a conditional equivalence and that there's a difference in the cost to compute those two objects (otherwise we'd ignore the conditional equivalence). Presumably the range on outer_ctx_1389 is global VARYING and iftmp.2373_1515 is global non-zero.
[Bug tree-optimization/103278] New: [12 Regression] Recent change to cddce inhibits switch optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103278 Bug ID: 103278 Summary: [12 Regression] Recent change to cddce inhibits switch optimization Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: law at gcc dot gnu.org Target Milestone: --- On iq2000-elf this change: commit 045206450386bcd774db3bde0c696828402361c6 Author: Richard Biener Date: Fri Nov 12 10:21:22 2021 +0100 tree-optimization/102880 - improve CD-DCE [ ... ] Is inhibiting switch optimization for tree-ssa/if-to-switch-3.c from converting an if statement into a switch statement: /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-iftoswitch-optimized" } */ int IsMySuperRandomChar(int aChar) { return aChar == 0x0009 || aChar == 0x000A || aChar == 0x000C || aChar == 0x000D || aChar == 0x0020 || aChar == 0x0030; } /* { dg-final { scan-tree-dump "Condition chain with \[^\n\r]\* BBs transformed into a switch statement." "iftoswitch" } } */ After today's cd-dce change we no longer turn that into a switch: Before: ;; Canonical GIMPLE case clusters: 9-10 12 13 32 48 ;; JT can be built: JT(values:6 comparisons:10 range:40 density: 25.00%):9-48 j.c:8:26: optimized: Condition chain with 3 BBs transformed into a switch statement. After: ;; Canonical GIMPLE case clusters: 9-10 12-13 32 48
[Bug tree-optimization/103278] [12 Regression] Recent change to cddce inhibits switch optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103278 --- Comment #3 from Jeffrey A. Law --- Note we also see these regressions: rl78-elf if-to-switch-5 if-to-switch-9 xstormy16-elf if-to-switch-9 sh3-linux-gnu sh3eb-linux-gnu gcc.target/sh/pr51244-19.c, but I think this is fixable with a trivial sh.md change
[Bug tree-optimization/103226] [12 Regression] Recent change to copy-headers causes execution failure for gcc.dg/torture/pr80974
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103226 --- Comment #10 from Jeffrey A. Law --- Aldy, the trick is to not build the C++ runtime ;-) So instead of "make" use "make all-gcc && make all-target-libgcc" to build the compiler and libgcc runtime. Then use "make install-gcc install-target-libgcc" to install the compiler and libgcc runtime. Once that completes you can proceed to build & install newlib. Make sure to use the same --prefix for each component (binutils-gdb, gcc, newlib) and they should.
[Bug tree-optimization/102756] [12 Regression] Complete unrolling is too senative to PRE; c-c++-common/torture/vector-compare-2.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102756 --- Comment #4 from Jeffrey A. Law --- I could also set up a toolchain ready-to-debug in an AWS instance that you could use if that would be helpful.
[Bug tree-optimization/102756] [12 Regression] Complete unrolling is too senative to PRE; c-c++-common/torture/vector-compare-2.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102756 --- Comment #5 from Jeffrey A. Law --- Ignore last comment. Meant for a different BZ.
[Bug tree-optimization/103226] [12 Regression] Recent change to copy-headers causes execution failure for gcc.dg/torture/pr80974
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103226 --- Comment #11 from Jeffrey A. Law --- Aldy, I could also set up a cross toolchain, ready for debugging in an AWS instance if that would be helpful.