On Tue, Aug 13, 2024 at 09:13:37AM +0100, Roger Sayle wrote: > > Hi Xianmiao, > I have no objection to reverting that original patch, if it was indeed made > obsolete by > later changes to the i386 backend. > > The theory at the time was that it was possible for backends to define mov > instructions > that emitted clobbers if necessary, but it's very difficult for a backend or > any of the > RTL middle-end passes to eliminate/remove these clobbers (that interfere > with some > passes). In the x86_64 case, the high and low parts were already in the > correct > registers, but the clobber caused reload/register allocation to copy them > somewhere > else, then copy them back again after the clobber.
I agree that we should try to eliminate clobber to reduce the generation of redundant instructions; I just think that removing clobber in lower-subreg is not well-suited for the current GCC framework. As I described in the commit message, the absence of clobber could potentially lead to the register's lifetime occupying the entire function, according to the algorithm of the 'df_lr_bb_local_compute' function. I tried to create a use case on X86 that could trigger this side effect, but I wasn't able to construct one. However, my analysis revealed that there is a coincidental aspect that causes X86 to avoid this situation. Let me first illustrate the coincidental point I discovered on X86, which will help us better understand the issue. Afterwards, I'll provide examples in RISC-V to demonstrate scenarios that could lead to side effects. For the test case: extern __int128 foo1(__int128, __int128); __int128 foo(__int128 x, __int128 y) { if (x > y) return foo1 (x+y, x); else return foo1 (y, y -x); } As the CLOBBER expression has been thoroughly eliminated by your later patch, I have reverted the commit to https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d8a6945c6ea22efa4d5e42fe1922d2b27953c8cd to observe the phenomenon. The X86 architecture will expand the multi-register mode move operation into subreg assignments during the 'expand' phase (which may differ from other architectures, particularly from certain implementations in RISC-V), as shown below. ;; basic block 2, loop depth 0, count 1073741824 (estimated locally), maybe hot ;; prev block 0, next block 4, flags: (NEW, REACHABLE, RTL, VISITED) ;; pred: ENTRY [always] count:1073741824 (estimated locally) (FALLTHRU) (note 9 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (insn 2 9 3 2 (set (reg:DI 87) (reg:DI 5 di [ x ])) "test.c":4:1 -1 (nil)) (insn 3 2 4 2 (set (reg:DI 88) (reg:DI 4 si [ x+8 ])) "test.c":4:1 -1 (nil)) (insn 4 3 5 2 (set (reg:TI 86) (subreg:TI (reg:DI 87) 0)) "test.c":4:1 -1 (nil)) (insn 5 4 6 2 (set (subreg:DI (reg:TI 86) 8) (reg:DI 88)) "test.c":4:1 -1 (nil)) (insn 6 5 7 2 (set (reg/v:TI 85 [ x ]) (reg:TI 86)) "test.c":4:1 -1 (nil)) For register 86, there is only an assignment via subreg, without any clobber or an assignment of the entire register. According to the algorithm of 'df_lr_bb_local_compute', even if register 86 has completely updated its value through subreg, it still cannot use the formula IN = (OUT & ~DEF) | USE It will still be considered within the LR IN. And its lifetime spans the entire function. Below is the LR information and instructions for the same block after `reginfo´ pass(Due to the elimination of register 86, we can focus on register 85). ;; lr in 1 [dx] 2 [cx] 4 [si] 5 [di] 6 [bp] 7 [sp] 16 [argp] 19 [frame] 85 ;; lr use 1 [dx] 2 [cx] 4 [si] 5 [di] 6 [bp] 7 [sp] 16 [argp] 19 [frame] 85 ;; lr def 17 [flags] 85 87 88 89 ;; live in 1 [dx] 2 [cx] 4 [si] 5 [di] 6 [bp] 7 [sp] 16 [argp] 19 [frame] ;; live gen 17 [flags] 85 87 88 89 ;; live kill (note 9 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (insn 2 9 3 2 (set (reg:DI 87 [ x ]) (reg:DI 5 di [ x ])) "test.c":4:1 88 {*movdi_internal} (expr_list:REG_DEAD (reg:DI 5 di [ x ]) (nil))) (insn 3 2 48 2 (set (reg:DI 88 [ x+8 ]) (reg:DI 4 si [ x+8 ])) "test.c":4:1 88 {*movdi_internal} (expr_list:REG_DEAD (reg:DI 4 si [ x+8 ]) (nil))) (insn 48 3 49 2 (set (subreg:DI (reg/v:TI 85 [ x ]) 0) (reg:DI 87 [ x ])) "test.c":4:1 88 {*movdi_internal} (expr_list:REG_DEAD (reg:DI 101 [ x ]) (nil))) (insn 49 48 7 2 (set (subreg:DI (reg/v:TI 85 [ x ]) 8) (reg:DI 88 [ x+8 ])) "test.c":4:1 88 {*movdi_internal} (expr_list:REG_DEAD (reg:DI 102 [+8 ]) (nil))) (insn 7 49 8 2 (set (reg/v:TI 89 [ y ]) (reg:TI 1 dx [ y ])) "test.c":4:1 87 {*movti_internal} (expr_list:REG_DEAD (reg:TI 1 dx [ y ]) (nil))) (note 8 7 11 2 NOTE_INSN_FUNCTION_BEG) (insn 11 8 12 2 (set (reg:CC 17 flags) (compare:CC (subreg:DI (reg/v:TI 89 [ y ]) 0) (subreg:DI (reg/v:TI 85 [ x ]) 0))) "test.c":5:6 12 {*cmpdi_1} (nil)) The LR information of register 85 precisely meets the requirements of the `init-regs´ pass, and after the `init-regs´ pass, an instruction will be generated to initialize register 85 to zero. (insn 2 9 3 2 (set (reg:DI 87 [ x ]) (reg:DI 5 di [ x ])) "test.c":4:1 88 {*movdi_internal} (expr_list:REG_DEAD (reg:DI 5 di [ x ]) (nil))) (insn 3 2 59 2 (set (reg:DI 88 [ x+8 ]) (reg:DI 4 si [ x+8 ])) "test.c":4:1 88 {*movdi_internal} (expr_list:REG_DEAD (reg:DI 4 si [ x+8 ]) (nil))) (insn 59 3 60 2 (clobber (reg/v:TI 85 [ x ])) "test.c":4:1 -1 (nil)) (insn 60 59 48 2 (set (reg/v:TI 85 [ x ]) (const_int 0 [0])) "test.c":4:1 -1 (nil)) (insn 48 60 49 2 (set (subreg:DI (reg/v:TI 85 [ x ]) 0) (reg:DI 87 [ x ])) "test.c":4:1 88 {*movdi_internal} (expr_list:REG_DEAD (reg:DI 87 [ x ]) (nil))) (insn 49 48 7 2 (set (subreg:DI (reg/v:TI 85 [ x ]) 8) (reg:DI 88 [ x+8 ])) "test.c":4:1 88 {*movdi_internal} (expr_list:REG_DEAD (reg:DI 88 [ x+8 ]) (nil))) (insn 7 49 8 2 (set (reg/v:TI 89 [ y ]) (reg:TI 1 dx [ y ])) "test.c":4:1 87 {*movti_internal} (expr_list:REG_DEAD (reg:TI 1 dx [ y ]) (nil))) After this, the LR information of register 85 will return to normal due to the generation of initialization instructions, and the absence of CLOBBER in the `lower-subreg´ pass will not have any side effects. > > Out of curiosity, PR target/43644 concerns 128-bit (TImode) moves and > argument > passing ABI on x86_64, but your test case implies you should be seeing > clobbers on > riscv after movdf is expanded? Can you describe the problematic RTL > sequence for > your testcase (it'll save me building a cross-compiler to riscv to > investigate for myself). > I presume it's an ILP32 ABI issue/constraint. I use the use case in my patch as a use case for RISC-V: double foo (double a) { if (a < 0.0) return a + 1.0; else if (a > 16.0) return a - 3.0; else if (a < 300.0) return a - 30.0; else return a; } As riscv don´t expand multi-register move in during the expand phase, it will generate the following sequence of instructions. (insn 7 6 8 (set (reg:DF 136) (const_double:DF 0.0 [0x0.0p+0])) "r.c":3:6 -1 (nil)) (insn 8 7 9 (set (reg:DF 12 a2) (reg:DF 136)) "r.c":3:6 -1 (nil)) (insn 9 8 10 (set (reg:DF 10 a0) (reg/v:DF 135 [ a ])) "r.c":3:6 -1 (nil)) These are normal move instructions and the `init-regs´ pass will not generate initialization instructions for them. So the CLOBBER instructions are important for them. In the lower-subreg pass, multi-register moves will be expanded into subreg moves. If CLOBBER is not generated, it will result in the live ranges of all split registers occupying the entire function. In the IRA pass, the LR information is recalculated. Taking register 173 as an example, we can see it in the LR IN/OUT of all blocks. ;; basic block 2, loop depth 0, count 1073741824 (estimated locally, freq 1.0000), maybe hot ;; prev block 0, next block 3, flags: (REACHABLE, RTL, MODIFIED) ;; pred: ENTRY [always] count:1073741824 (estimated locally, freq 1.0000) (FALLTHRU) ;; bb 2 artificial_defs: { } ;; bb 2 artificial_uses: { u-1(2){ }u-1(8){ }u-1(64){ }u-1(65){ }} ;; lr in 2 [sp] 8 [s0] 10 [a0] 11 [a1] 64 [arg] 65 [frame] 173 174 175 176 177 178 ... ;; succ: 3 [26.4% (guessed)] count:283038344 (estimated locally, freq 0.2636) (FALLTHRU) ;; 4 [73.6% (guessed)] count:790703480 (estimated locally, freq 0.7364) ;; lr out 2 [sp] 8 [s0] 64 [arg] 65 [frame] 165 166 173 174 175 176 177 178 ;; live out 2 [sp] 8 [s0] 64 [arg] 65 [frame] 165 166 ;; basic block 3, loop depth 0, count 283038344 (estimated locally, freq 0.2636), maybe hot ;; prev block 2, next block 4, flags: (REACHABLE, RTL, MODIFIED) ;; pred: 2 [26.4% (guessed)] count:283038344 (estimated locally, freq 0.2636) (FALLTHRU) ;; bb 3 artificial_defs: { } ;; bb 3 artificial_uses: { u-1(2){ }u-1(8){ }u-1(64){ }u-1(65){ }} ;; lr in 2 [sp] 8 [s0] 64 [arg] 65 [frame] 165 166 173 178 ;; lr use 2 [sp] 8 [s0] 64 [arg] 65 [frame] 165 166 173 ;; lr def 0 [zero] 1 [ra] 3 [gp] 4 [tp] 5 [t0] 6 [t1] 7 [t2] 10 [a0] 11 [a1] 12 [a2] 13 [a3] 14 [a4] 15 [a5] 16 [a6] 17 [a7] 28 [t3] 29 [t4] 30 [t5] 31 [t6] 32 [ft0] 33 [ft1] 34 [ft2] 35 [ft3] 36 [ft4] 37 [ft5] 38 [ft6] 39 [ft7] 40 [fs0] 41 [fs1] 42 [fa0] 43 [fa1] 44 [fa2] 45 [fa3] 46 [fa4] 47 [fa5] 48 [fa6] 49 [fa7] 50 [fs2] 51 [fs3] 52 [fs4] 53 [fs5] 54 [fs6] 55 [fs7] 56 [fs8] 57 [fs9] 58 [fs10] 59 [fs11] 60 [ft8] 61 [ft9] 62 [ft10] 63 [ft11] 66 [vl] 67 [vtype] 68 [vxrm] 69 [frm] 70 [vxsat] 71 [N/A] 72 [N/A] 73 [N/A] 74 [N/A] 75 [N/A] 76 [N/A] 77 [N/A] 78 [N/A] 79 [N/A] 80 [N/A] 81 [N/A] 82 [N/A] 83 [N/A] 84 [N/A] 85 [N/A] 86 [N/A] 87 [N/A] 88 [N/A] 89 [N/A] 90 [N/A] 91 [N/A] 92 [N/A] 93 [N/A] 94 [N/A] 95 [N/A] 96 [v0] 97 [v1] 98 [v2] 99 [v3] 100 [v4] 101 [v5] 102 [v6] 103 [v7] 104 [v8] 105 [v9] 106 [v10] 107 [v11] 108 [v12] 109 [v13] 110 [v14] 111 [v15] 112 [v16] 113 [v17] 114 [v18] 115 [v19] 116 [v20] 117 [v21] 118 [v22] 119 [v23] 120 [v24] 121 [v25] 122 [v26] 123 [v27] 124 [v28] 125 [v29] 126 [v30] 127 [v31] 139 163 164 167 168 173 ;; live in 2 [sp] 8 [s0] 64 [arg] 65 [frame] 165 166 ;; live gen 10 [a0] 11 [a1] 12 [a2] 13 [a3] 139 163 164 167 168 173 ;; live kill 1 [ra] (note 16 12 18 3 [bb 3] NOTE_INSN_BASIC_BLOCK) (note 18 16 110 3 NOTE_INSN_DELETED) (insn 110 18 111 3 (set (subreg:SI (reg:DI 173) 0) (reg:SI 165)) "r.c":4:14 276 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 165) (nil))) (insn 111 110 17 3 (set (subreg:SI (reg:DI 173) 4) (reg:SI 166 [+4 ])) "r.c":4:14 276 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 166 [+4 ]) (nil))) (insn 17 111 112 3 (set (reg/f:SI 139) (high:SI (symbol_ref:SI ("*.LANCHOR0") [flags 0x182]))) "r.c":4:14 276 {*movsi_internal} (nil)) (insn 112 17 19 3 (set (reg:DF 10 a0) (subreg:DF (reg:DI 173) 0)) "r.c":4:14 287 {*movdf_softfloat} (expr_list:REG_DEAD (reg:DI 173) (nil))) Therefore, I think that removing the generation of CLOBBERs in lower-subreg.cc is inappropriate. Even though it may not have side effects due to certain backend optimizations in some architectures, it poses a risk to what the lower-subreg pass is designed to accomplish. BR, Xianmiao