Emitting "(clobber (reg X))" before "(set (subreg (reg X)) (...))" keeps data flow consistent, but it also increases register allocation pressure and thus often creates many unwanted register-to-register moves that cannot be optimized away. It seems just analogous to partial register stall which is a famous problem on processors that do register renaming.
In my opinion, when the register to be clobbered is a composite of hard ones, we should clobber the individual elements separetely, otherwise clear the entire to zero prior to use as the "init-regs" pass does (like partial register stall workarounds on x86 CPUs). Such redundant zero constant assignments will be removed later in the "cprop_hardreg" pass. This patch may give better output code quality for the reasons above, especially on architectures that don't have DFmode hard registers (On architectures with such hard registers, this patch changes virtually nothing). For example (Espressif ESP8266, Xtensa without FP hard regs): /* example */ double _Complex conjugate(double _Complex z) { __imag__(z) *= -1; return z; } ;; before conjugate: movi.n a6, -1 slli a6, a6, 31 mov.n a8, a2 mov.n a9, a3 mov.n a7, a4 xor a6, a5, a6 mov.n a2, a8 mov.n a3, a9 mov.n a4, a7 mov.n a5, a6 ret.n ;; after conjugate: movi.n a6, -1 slli a6, a6, 31 xor a6, a5, a6 mov.n a5, a6 ret.n gcc/ChangeLog: * lower-subreg.cc (resolve_simple_move): Add zero clear of the entire register immediately after the clobber. * expr.cc (emit_move_complex_parts): Change to clobber the real and imaginary parts separately instead of the whole complex register if possible. --- gcc/expr.cc | 26 ++++++++++++++++++++------ gcc/lower-subreg.cc | 7 ++++++- 2 files changed, 26 insertions(+), 7 deletions(-) diff --git a/gcc/expr.cc b/gcc/expr.cc index 80bb1b8a4c5..9732e8fd4e5 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -3775,15 +3775,29 @@ emit_move_complex_push (machine_mode mode, rtx x, rtx y) rtx_insn * emit_move_complex_parts (rtx x, rtx y) { - /* Show the output dies here. This is necessary for SUBREGs - of pseudos since we cannot track their lifetimes correctly; - hard regs shouldn't appear here except as return values. */ - if (!reload_completed && !reload_in_progress - && REG_P (x) && !reg_overlap_mentioned_p (x, y)) - emit_clobber (x); + rtx_insn *re_insn, *im_insn; write_complex_part (x, read_complex_part (y, false), false, true); + re_insn = get_last_insn (); write_complex_part (x, read_complex_part (y, true), true, false); + im_insn = get_last_insn (); + + /* Show the output dies here. This is necessary for SUBREGs + of pseudos since we cannot track their lifetimes correctly. */ + if (can_create_pseudo_p () + && REG_P (x) && ! reg_overlap_mentioned_p (x, y)) + { + /* Hard regs shouldn't appear here except as return values. */ + if (HARD_REGISTER_P (x) && REG_NREGS (x) % 2 == 0) + { + emit_insn_before (gen_clobber (SET_DEST (PATTERN (re_insn))), + re_insn); + emit_insn_before (gen_clobber (SET_DEST (PATTERN (im_insn))), + im_insn); + } + else + emit_insn_before (gen_clobber (x), re_insn); + } return get_last_insn (); } diff --git a/gcc/lower-subreg.cc b/gcc/lower-subreg.cc index 03e9326c663..4ff0a7d1556 100644 --- a/gcc/lower-subreg.cc +++ b/gcc/lower-subreg.cc @@ -1086,7 +1086,12 @@ resolve_simple_move (rtx set, rtx_insn *insn) unsigned int i; if (REG_P (dest) && !HARD_REGISTER_NUM_P (REGNO (dest))) - emit_clobber (dest); + { + emit_clobber (dest); + /* We clear the entire of dest with zero after the clobber, + similar to the "init-regs" pass. */ + emit_move_insn (dest, CONST0_RTX (GET_MODE (dest))); + } for (i = 0; i < words; ++i) { -- 2.20.1