https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80155
--- Comment #15 from Thomas Preud'homme <thopre01 at gcc dot gnu.org> --- (In reply to Thomas Preud'homme from comment #14) > (In reply to rguent...@suse.de from comment #13) > > On Thu, 23 Mar 2017, thopre01 at gcc dot gnu.org wrote: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80155 > > > > > > --- Comment #12 from Thomas Preud'homme <thopre01 at gcc dot gnu.org> --- > > > (In reply to Richard Biener from comment #9) > > > > Ah, the patches do not fix the testcase because the testcase is _not_ > > > > the > > > > PRE-creates-IV case. It's indeed simply hoisting/PRE at work > > > > transforming > > > > > > > > # a_14 = PHI <a_10(, a_5(D)> > > > > if (!b) > > > > a_8 = a_14 + 1; > > > > > > > > # a_2 = PHI <a_14(10), a_8(4)> > > > > a_10 = a_2 + 1; > > > > ... = *(a_2 + 1); > > > > > > > > to > > > > > > > > # a_14 = PHI <prephimp_12, a_5(D)> > > > > _4 = a_14 + 1; > > > > if (b) > > > > _3 = _4 + 1; > > > > > > > > # a_2 = PHI <a_14, _4> > > > > # prephitmp_12 = PHI <_4, _3> > > > > ... = *(a_2 + 1); > > > > > > > > increasing register pressure mainly because nothing figures that a_2 + 1 > > > > in the dereference can be replaced by prephitmp_12 ... > > > > > > > > So this is a missed SLSR opportunity or, in this simple form, a missed > > > > PRE/CSE opportunity. Fixing that with the following (otherwise > > > > untested) > > > > restores good code generation for the testcase: > > > > > > > > Index: gcc/tree-ssa-pre.c > > > > =================================================================== > > > > --- gcc/tree-ssa-pre.c (revision 246414) > > > > +++ gcc/tree-ssa-pre.c (working copy) > > > > @@ -4636,6 +4610,35 @@ eliminate_dom_walker::before_dom_childre > > > > } > > > > } > > > > > > > > + if (gimple_has_ops (stmt)) > > > > + for (unsigned i = 0; i < gimple_num_ops (stmt); ++i) > > > > + { > > > > + tree op = gimple_op (stmt, i); > > > > + if (op) > > > > + op = get_base_address (op); > > > > + if (op > > > > + && TREE_CODE (op) == MEM_REF > > > > + && ! integer_zerop (TREE_OPERAND (op, 1))) > > > > + { > > > > + tree ops[2]; > > > > + vn_nary_op_t nary; > > > > + ops[0] = TREE_OPERAND (op, 0); > > > > + ops[1] = TREE_OPERAND (op, 1); > > > > + tree res = vn_nary_op_lookup_pieces (2, > > > > POINTER_PLUS_EXPR, > > > > + TREE_TYPE (ops[0]), > > > > + ops, &nary); > > > > + if (res && TREE_CODE (res) == SSA_NAME) > > > > + res = eliminate_avail (res); > > > > + if (res) > > > > + { > > > > + TREE_OPERAND (op, 0) = res; > > > > + TREE_OPERAND (op, 1) > > > > + = build_int_cst (TREE_TYPE (TREE_OPERAND (op, > > > > 1)), 0); > > > > + gimple_set_modified (stmt, true); > > > > Add > > > > if (TREE_CODE (res) == SSA_NAME > > && !is_gimple_debug (stmt)) > > gimple_set_plf (SSA_NAME_DEF_STMT (res), > > NECESSARY, true); > > > > here. > > > > > > + } > > > > + } > > > > + } > > > > + > > > > if (gimple_modified_p (stmt)) > > > > { > > > > /* If a formerly non-invariant ADDR_EXPR is turned into an > > > > > > > > note that in general optimzing > > > > > > > > q = p + 1; > > > > = ...*(p + 1); > > > > > > > > "back" to *q will be undone by forwprop/stmt folding later but in this > > > > case the feeding stmt is a PHI node and not a pointer-plus. It still > > > > means that the change might be a bit too disruptive at this point > > > > (we could restricit it a bit by only handling the case where we don't > > > > replace with a pointer-plus result). > > > > > > Thanks for your work on this! Sadly GCC ICEs with this patch: > > > > > > 0xd36f53 update_dep_bb > > > gcc/tree-ssa-tail-merge.c:411 > > > 0xd38f54 stmt_update_dep_bb > > > gcc/tree-ssa-tail-merge.c:429 > > > 0xd38f54 same_succ_hash > > > gcc/tree-ssa-tail-merge.c:452 > > > 0xd38f54 find_same_succ_bb > > > gcc/tree-ssa-tail-merge.c:717 > > > 0xd39927 find_same_succ > > > gcc/tree-ssa-tail-merge.c:748 > > > 0xd39927 init_worklist > > > gcc/tree-ssa-tail-merge.c:767 > > > 0xd39927 tail_merge_optimize(unsigned int) > > > gcc/tree-ssa-tail-merge.c:1726 > > > 0xce2d6a execute > > > gcc/tree-ssa-pre.c:5164 > > > > > > > > There's progress. Performance is improved but not as much as disabling code > hoisting. I'll try to reduce the testcase again with that patch to see if I > can find a testcase that expose all issues. Funnily this led back to the Cortex-M0+ reduced testcase. With the patch in comment #13 applied we can still see a difference in the push (one register pushed Vs 0).