Hi! As discussed in the two PRs, it is a bad idea to rely on the reload pass keeping the IL insufficiently optimized (with noop moves), it regresses important benchmarks on PowerPC. The actual reason why on the PR90178 testcase we generated better code is that we had an almost empty block (containing just (use (reg rax))) with 3 predecessors and fallthru to exit during cfg cleanup after the vzeroupper pass. The mode switching pass which vzeroupper pass uses under the hood adds an empty extra basic block on the edge from that bb to exit; as we consider USE insns useless after reload, both basic blocks were FORWARDER_BLOCK_P and we succeeded in threading two edges into that almost empty block to the artificial block after it (effectively removing the USE on those paths), while in the third case it was kept and that precluded thread_jump from merging the two paths later on.
The following patch adds the reversion^N like has been applied on the gcc-9-branch (if somebody is willing to test, a later possibility is to replace that by a fast DCE at the end of LRA like richi has posted in the PR) and considers USEs of return value as flow_active_insn_p even after reload. This predicate is rarely used, mostly for the forwarder testing. Bootstrapped/regtested on x86_64-linux and i686-linux, additionally tested that it only affects very small number of stage3 gcc/*.o files and either doesn't change the generated .text size at all, or on a single *.o file makes it a few bytes shorter. Preapproved by Richard on IRC, committed to trunk. 2019-04-29 Jakub Jelinek <ja...@redhat.com> PR rtl-optimization/90257 * cfgrtl.c (flow_active_insn_p): Return true for USE of a function return value. Revert the revert: 2019-04-21 H.J. Lu <hongjiu...@intel.com> PR target/90178 Revert: 2018-11-21 Uros Bizjak <ubiz...@gmail.com> Revert the revert: 2013-10-26 Vladimir Makarov <vmaka...@redhat.com> Revert: 2013-10-25 Vladimir Makarov <vmaka...@redhat.com> * lra-spills.c (lra_final_code_change): Remove useless move insns. --- gcc/lra-spills.c.jj 2019-04-27 23:48:49.973528266 +0200 +++ gcc/lra-spills.c 2019-04-29 16:07:58.095983476 +0200 @@ -740,6 +740,7 @@ lra_final_code_change (void) int i, hard_regno; basic_block bb; rtx_insn *insn, *curr; + rtx set; int max_regno = max_reg_num (); for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++) @@ -818,5 +819,19 @@ lra_final_code_change (void) } if (insn_change_p) lra_update_operator_dups (id); + + if ((set = single_set (insn)) != NULL + && REG_P (SET_SRC (set)) && REG_P (SET_DEST (set)) + && REGNO (SET_SRC (set)) == REGNO (SET_DEST (set))) + { + /* Remove an useless move insn. IRA can generate move + insns involving pseudos. It is better remove them + earlier to speed up compiler a bit. It is also + better to do it here as they might not pass final RTL + check in LRA, (e.g. insn moving a control register + into itself). */ + lra_invalidate_insn_data (insn); + delete_insn (insn); + } } } --- gcc/cfgrtl.c.jj 2019-04-27 23:48:50.675517128 +0200 +++ gcc/cfgrtl.c 2019-04-29 16:10:34.364519413 +0200 @@ -543,7 +543,7 @@ update_bb_for_insn (basic_block bb) } -/* Like active_insn_p, except keep the return value clobber around +/* Like active_insn_p, except keep the return value use or clobber around even after reload. */ static bool @@ -556,8 +556,12 @@ flow_active_insn_p (const rtx_insn *insn programs that fail to return a value. Its effect is to keep the return value from being live across the entire function. If we allow it to be skipped, we introduce the - possibility for register lifetime confusion. */ - if (GET_CODE (PATTERN (insn)) == CLOBBER + possibility for register lifetime confusion. + Similarly, keep a USE of the function return value, otherwise + the USE is dropped and we could fail to thread jump if USE + appears on some paths and not on others, see PR90257. */ + if ((GET_CODE (PATTERN (insn)) == CLOBBER + || GET_CODE (PATTERN (insn)) == USE) && REG_P (XEXP (PATTERN (insn), 0)) && REG_FUNCTION_VALUE_P (XEXP (PATTERN (insn), 0))) return true; Jakub