On Wed, Jul 3, 2024 at 9:26 PM Georg-Johann Lay <a...@gjlay.de> wrote: > > > > Am 02.07.24 um 15:48 schrieb Richard Biener: > > On Tue, Jul 2, 2024 at 3:43 PM Georg-Johann Lay <a...@gjlay.de> wrote: > >> > >> Hi Jeff, > >> > >> This is a patch to get correct code out of 64-bit > >> loads from address-space __memx. > >> > >> The AVR address-spaces may require that move insns issue > >> calls to library support functions, a fact that -ftree-ter > >> doesn't account for. tree-ssa-ter.cc then replaces an > >> expression across such a library call, resulting in wrong code. > >> > >> This patch disables that pass per default on avr, as there is no > >> more fine grained way to avoid malicious optimizations. > >> The pass can still be re-enabled by means of explicit -ftree-ter. > >> > >> Ok to apply? > > > > I think this requires more details on what goes wrong - I assume > > it's not stmt reordering that effectively happens but recursive > > expand_expr on SSA defs when those invoke libcalls? In that > > case this would point to a deeper issue. > > The difference is that with TER, we get a hard reg in .expand > for a movdi from 24-bit address-space __memx. > > Such moves require library calls, which in turn require > specific hard registers. As avr backend has no movdi, the > moddi gets expanded as 8 * movqi, and that does not work > when the target registers are hard regs, as some of them > are clobbered by the libcalls.
So I see (insn 18 17 19 2 (parallel [ (set (reg:QI 22 r22 [+4 ]) (mem/u/c:QI (reg/f:PSI 55) [1 aa+4 S1 A8 AS7])) (clobber (reg:QI 22 r22)) (clobber (reg:QI 21 r21)) (clobber (reg:HI 30 r30)) ]) "t.c":12:13 38 {xloadqi_A} (nil)) (insn 19 18 20 2 (set (reg:PSI 56) (reg/f:PSI 47)) "t.c":12:13 112 {*movpsi_split} (nil)) (insn 20 19 21 2 (parallel [ (set (reg/f:PSI 57) (plus:PSI (reg/f:PSI 47) (const_int 5 [0x5]))) (clobber (scratch:QI)) ]) "t.c":12:13 205 {addpsi3} (nil)) (insn 21 20 22 2 (parallel [ (set (reg:QI 23 r23 [+5 ]) (mem/u/c:QI (reg/f:PSI 57) [1 aa+5 S1 A8 AS7])) (clobber (reg:QI 22 r22)) (clobber (reg:QI 21 r21)) (clobber (reg:HI 30 r30)) ]) "t.c":12:13 38 {xloadqi_A} (nil)) for example - insn 21 clobbers r22 which is also the destination of insn 18. With -fno-tree-ter those oddly get _no_ intermediate reg but we have (insn 9 8 10 2 (parallel [ (set (subreg:QI (reg:DI 43 [ aa.0_1 ]) 1) (mem/u/c:QI (reg/f:PSI 48) [1 aa+1 S1 A8 AS7])) (clobber (reg:QI 22 r22)) (clobber (reg:QI 21 r21)) (clobber (reg:HI 30 r30)) ]) "t.c":12:13 38 {xloadqi_A} (nil)) but since on GIMPLE we have DImode loads I don't see how TER comes into play here - TER should favor the second code generation, not the first ... (or TER shouldn't play into here at all). with -fno-tree-ter we come via #0 expand_expr_real (exp=<var_decl 0x7ffff7162000 aa>, target=0x7ffff716c9a8, tmode=E_DImode, modifier=EXPAND_NORMAL, alt_rtl=0x7fffffffcff8, inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:9433 #1 0x000000000109fe63 in store_expr (exp=<var_decl 0x7ffff7162000 aa>, target=0x7ffff716c9a8, call_param_p=0, nontemporal=false, reverse=false) at /space/rguenther/src/gcc/gcc/expr.cc:6740 #2 0x000000000109e626 in expand_assignment (to=<ssa_name 0x7ffff713f678 1>, from=<var_decl 0x7ffff7162000 aa>, nontemporal=false) at /space/rguenther/src/gcc/gcc/expr.cc:6461 while with TER we instead have #0 expand_expr_real (exp=<var_decl 0x7ffff7162000 aa>, target=0x0, tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0, inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:9433 #1 0x00000000010b279f in expand_expr_real_gassign (g=0x7ffff71613c0, target=0x0, tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0, inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:11100 #2 0x00000000010b3294 in expand_expr_real_1 (exp=<ssa_name 0x7ffff713f678 1>, target=0x0, tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0, inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:11278 the difference is -fno-tree-ter has a target (reg:DI 43 [...]) but with TER we are not passing a target or a mode. I think the issue is that the expansion at some point doesn't expect the result to end up in a hard register. Maybe define_expand are not supposed to do that but maybe expansion needs to fix up. A first thought was diff --git a/gcc/expr.cc b/gcc/expr.cc index ffbac513692..1509acad02e 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -11111,6 +11111,12 @@ expand_expr_real_gassign (gassign *g, rtx target, machine_mode tmode, gcc_unreachable (); } set_curr_insn_location (saved_loc); + if (!target && REG_P (r) && REGNO (r) < FIRST_PSEUDO_REGISTER) + { + rtx tem = gen_reg_rtx (GET_MODE (r)); + emit_move_insn (tem, r); + r = tem; + } if (REG_P (r) && !REG_EXPR (r)) set_reg_attrs_for_decl_rtl (lhs, r); return r; but of course that's not the place to fix - this sees (mem/u/c:DI (reg/f:PSI 47) [1 aa+0 S8 A8 AS7]) as result and things go wrong somewhere in the chain of expanding things from the return, possibly at the point of expanding the plus and there possibly when building subregs of the DImode mem. You'd have to trace that down but the fix in the end is to do sth like the above or alternatively, in the expander producing the 'xload' make sure to not allow a hardreg as destination when you can still create pseudos? Richard. > Moreover, even with TER, the code is no more efficient than > without it, so it's not clear what's the point in propagating > hard regs into expander operands. Later passes like fwprop1 and > combine can do that, too. > > Requiring libcalls in a mov insn is quite special indeed, > and there is no way to tell that to TER. TER itself does not > optimize code involving libcalls, so it knows they are fragile. > > > So - if the wrongness is already apparent in the RTL expansion > > pass dump can you quote the respective pieces and explain why? > > It expand a 64-bit move from __memx address-space to registers > R18...R25. This is broken into 8 QI moves to these regs, but > the movqi requires a libcall in some situations, which pass their > arguments in R21...R25. Hence the libcalls clobber some of the > destination regs. > > It would already help when TER would not propagate hard-regs into > expander operands. > > Johann > > >> As an alternative, the option could be disabled permanently in > >> avr.cc::avr_option_override(). > >> > >> Johann > >> > >> -- > >> > >> AVR: middle-end/87376 - Use -fno-tree-ter per default. > >> > >> Temporary expression replacement might replace expressions across > >> library calls, for example with move insn from address-space __memx > >> like in PR87376. -ftree-ter has no way where the backend could hook > >> in to avoid only problematic replacements, thus kick it out altogether. > >> > >> PR middle-end/87376 > >> gcc/ > >> * common/config/avr/avr-common.cc (avr_option_optimization_table) > >> <OPT_ftree_ter>: Set to 0. > >> gcc/testsuite/ > >> * gcc.target/avr/torture/pr87376-memx.c: New test.