On Tue, Aug 13, 2024 at 09:13:37AM +0100, Roger Sayle wrote:
> 
> Hi Xianmiao,
> I have no objection to reverting that original patch, if it was indeed made
> obsolete by
> later changes to the i386 backend.
> 
> The theory at the time was that it was possible for backends to define mov
> instructions
> that emitted clobbers if necessary, but it's very difficult for a backend or
> any of the
> RTL middle-end passes to eliminate/remove these clobbers (that interfere
> with some
> passes).  In the x86_64 case, the high and low parts were already in the
> correct
> registers, but the clobber caused reload/register allocation to copy them
> somewhere
> else, then copy them back again after the clobber.

I agree that we should try to eliminate clobber to reduce the generation of
redundant instructions; I just think that removing clobber in lower-subreg
is not well-suited for the current GCC framework.

As I described in the commit message, the absence of clobber could
potentially lead to the register's lifetime occupying the entire function,
according to the algorithm of the 'df_lr_bb_local_compute' function.

I tried to create a use case on X86 that could trigger this side effect,
but I wasn't able to construct one. However, my analysis revealed that
there is a coincidental aspect that causes X86 to avoid this situation.
Let me first illustrate the coincidental point I discovered on X86,
which will help us better understand the issue. Afterwards, I'll provide
examples in RISC-V to demonstrate scenarios that could lead to side effects.

For the test case:
  extern __int128 foo1(__int128, __int128);
  __int128 foo(__int128 x, __int128 y)
  {
    if (x > y)
      return foo1 (x+y, x);
    else
      return foo1 (y, y -x);
  }

As the CLOBBER expression has been thoroughly eliminated by your later patch,
I have reverted the commit to 
  
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d8a6945c6ea22efa4d5e42fe1922d2b27953c8cd
to observe the phenomenon. 
The X86 architecture will expand the multi-register mode move operation into
subreg assignments during the 'expand' phase (which may differ from other
architectures, particularly from certain implementations in RISC-V), as shown 
below.
  ;; basic block 2, loop depth 0, count 1073741824 (estimated locally), maybe 
hot
  ;;  prev block 0, next block 4, flags: (NEW, REACHABLE, RTL, VISITED)
  ;;  pred:       ENTRY [always]  count:1073741824 (estimated locally) 
(FALLTHRU)
  (note 9 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
  (insn 2 9 3 2 (set (reg:DI 87)
          (reg:DI 5 di [ x ])) "test.c":4:1 -1
       (nil))
  (insn 3 2 4 2 (set (reg:DI 88)
          (reg:DI 4 si [ x+8 ])) "test.c":4:1 -1
       (nil))
  (insn 4 3 5 2 (set (reg:TI 86)
          (subreg:TI (reg:DI 87) 0)) "test.c":4:1 -1
       (nil))
  (insn 5 4 6 2 (set (subreg:DI (reg:TI 86) 8)
          (reg:DI 88)) "test.c":4:1 -1
       (nil))
  (insn 6 5 7 2 (set (reg/v:TI 85 [ x ])
          (reg:TI 86)) "test.c":4:1 -1
       (nil))

For register 86, there is only an assignment via subreg, without any clobber or 
an
assignment of the entire register. According to the algorithm of 
'df_lr_bb_local_compute',
even if register 86 has completely updated its value through subreg, it still 
cannot use the formula
  IN = (OUT & ~DEF) | USE
It will still be considered within the LR IN. And its lifetime spans the entire 
function.
Below is the LR information and instructions for the same block after `reginfo´ 
pass(Due to
the elimination of register 86, we can focus on register 85).
  ;; lr  in        1 [dx] 2 [cx] 4 [si] 5 [di] 6 [bp] 7 [sp] 16 [argp] 19 
[frame] 85
  ;; lr  use       1 [dx] 2 [cx] 4 [si] 5 [di] 6 [bp] 7 [sp] 16 [argp] 19 
[frame] 85
  ;; lr  def       17 [flags] 85 87 88 89
  ;; live  in      1 [dx] 2 [cx] 4 [si] 5 [di] 6 [bp] 7 [sp] 16 [argp] 19 
[frame]
  ;; live  gen     17 [flags] 85 87 88 89
  ;; live  kill
  (note 9 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
  (insn 2 9 3 2 (set (reg:DI 87 [ x ])
          (reg:DI 5 di [ x ])) "test.c":4:1 88 {*movdi_internal}
       (expr_list:REG_DEAD (reg:DI 5 di [ x ])
          (nil)))
  (insn 3 2 48 2 (set (reg:DI 88 [ x+8 ])
          (reg:DI 4 si [ x+8 ])) "test.c":4:1 88 {*movdi_internal}
       (expr_list:REG_DEAD (reg:DI 4 si [ x+8 ])
          (nil)))
  (insn 48 3 49 2 (set (subreg:DI (reg/v:TI 85 [ x ]) 0)
          (reg:DI 87 [ x ])) "test.c":4:1 88 {*movdi_internal}
       (expr_list:REG_DEAD (reg:DI 101 [ x ])
          (nil)))
  (insn 49 48 7 2 (set (subreg:DI (reg/v:TI 85 [ x ]) 8)
          (reg:DI 88 [ x+8 ])) "test.c":4:1 88 {*movdi_internal}
       (expr_list:REG_DEAD (reg:DI 102 [+8 ])
          (nil)))
  (insn 7 49 8 2 (set (reg/v:TI 89 [ y ])
          (reg:TI 1 dx [ y ])) "test.c":4:1 87 {*movti_internal}
       (expr_list:REG_DEAD (reg:TI 1 dx [ y ])
          (nil)))
  (note 8 7 11 2 NOTE_INSN_FUNCTION_BEG)
  (insn 11 8 12 2 (set (reg:CC 17 flags)
          (compare:CC (subreg:DI (reg/v:TI 89 [ y ]) 0)
              (subreg:DI (reg/v:TI 85 [ x ]) 0))) "test.c":5:6 12 {*cmpdi_1}
       (nil))

The LR information of register 85 precisely meets the requirements of the 
`init-regs´ pass,
and after the `init-regs´ pass, an instruction will be generated to initialize 
register 85 to zero.
  (insn 2 9 3 2 (set (reg:DI 87 [ x ])
          (reg:DI 5 di [ x ])) "test.c":4:1 88 {*movdi_internal}
       (expr_list:REG_DEAD (reg:DI 5 di [ x ])
          (nil)))
  (insn 3 2 59 2 (set (reg:DI 88 [ x+8 ])
          (reg:DI 4 si [ x+8 ])) "test.c":4:1 88 {*movdi_internal}
       (expr_list:REG_DEAD (reg:DI 4 si [ x+8 ])
          (nil)))
  (insn 59 3 60 2 (clobber (reg/v:TI 85 [ x ])) "test.c":4:1 -1
       (nil))
  (insn 60 59 48 2 (set (reg/v:TI 85 [ x ])
          (const_int 0 [0])) "test.c":4:1 -1
       (nil))
  (insn 48 60 49 2 (set (subreg:DI (reg/v:TI 85 [ x ]) 0)
          (reg:DI 87 [ x ])) "test.c":4:1 88 {*movdi_internal}
       (expr_list:REG_DEAD (reg:DI 87 [ x ])
          (nil)))
  (insn 49 48 7 2 (set (subreg:DI (reg/v:TI 85 [ x ]) 8)
          (reg:DI 88 [ x+8 ])) "test.c":4:1 88 {*movdi_internal}
       (expr_list:REG_DEAD (reg:DI 88 [ x+8 ])
          (nil)))
  (insn 7 49 8 2 (set (reg/v:TI 89 [ y ])
          (reg:TI 1 dx [ y ])) "test.c":4:1 87 {*movti_internal}
       (expr_list:REG_DEAD (reg:TI 1 dx [ y ])
          (nil)))

After this, the LR information of register 85 will return to normal due to the
generation of initialization instructions, and the absence of CLOBBER in the
`lower-subreg´ pass will not have any side effects.

> 
> Out of curiosity, PR target/43644 concerns 128-bit (TImode) moves and
> argument
> passing ABI on x86_64, but your test case implies you should be seeing
> clobbers on
> riscv after movdf is expanded?  Can you describe the problematic RTL
> sequence for
> your testcase (it'll save me building a cross-compiler to riscv to
> investigate for myself).
> I presume it's an ILP32 ABI issue/constraint.

I use the use case in my patch as a use case for RISC-V:
  double foo (double a)
  {
    if (a < 0.0)
      return a + 1.0;
    else if (a > 16.0)
      return a - 3.0;
    else if (a < 300.0)
      return a - 30.0;
    else
      return a;
  }

As riscv don´t expand multi-register move in during the expand phase,
it will generate the following sequence of instructions.
  (insn 7 6 8 (set (reg:DF 136)
          (const_double:DF 0.0 [0x0.0p+0])) "r.c":3:6 -1
       (nil))
  (insn 8 7 9 (set (reg:DF 12 a2)
          (reg:DF 136)) "r.c":3:6 -1
       (nil))
  (insn 9 8 10 (set (reg:DF 10 a0)
          (reg/v:DF 135 [ a ])) "r.c":3:6 -1
       (nil))

These are normal move instructions and the `init-regs´ pass will not
generate initialization instructions for them. So the CLOBBER instructions
are important for them. In the lower-subreg pass, multi-register moves
will be expanded into subreg moves. If CLOBBER is not generated, it will
result in the live ranges of all split registers occupying the entire function.

In the IRA pass, the LR information is recalculated. Taking register 173 as
an example, we can see it in the LR IN/OUT of all blocks.
  ;; basic block 2, loop depth 0, count 1073741824 (estimated locally, freq 
1.0000), maybe hot
  ;;  prev block 0, next block 3, flags: (REACHABLE, RTL, MODIFIED)
  ;;  pred:       ENTRY [always]  count:1073741824 (estimated locally, freq 
1.0000) (FALLTHRU)
  ;; bb 2 artificial_defs: { }
  ;; bb 2 artificial_uses: { u-1(2){ }u-1(8){ }u-1(64){ }u-1(65){ }}
  ;; lr  in        2 [sp] 8 [s0] 10 [a0] 11 [a1] 64 [arg] 65 [frame] 173 174 
175 176 177 178
  ...
  ;;  succ:       3 [26.4% (guessed)]  count:283038344 (estimated locally, freq 
0.2636) (FALLTHRU)
  ;;              4 [73.6% (guessed)]  count:790703480 (estimated locally, freq 
0.7364)
  ;; lr  out       2 [sp] 8 [s0] 64 [arg] 65 [frame] 165 166 173 174 175 176 
177 178
  ;; live  out     2 [sp] 8 [s0] 64 [arg] 65 [frame] 165 166
  
  ;; basic block 3, loop depth 0, count 283038344 (estimated locally, freq 
0.2636), maybe hot
  ;;  prev block 2, next block 4, flags: (REACHABLE, RTL, MODIFIED)
  ;;  pred:       2 [26.4% (guessed)]  count:283038344 (estimated locally, freq 
0.2636) (FALLTHRU)
  ;; bb 3 artificial_defs: { }
  ;; bb 3 artificial_uses: { u-1(2){ }u-1(8){ }u-1(64){ }u-1(65){ }}
  ;; lr  in        2 [sp] 8 [s0] 64 [arg] 65 [frame] 165 166 173 178
  ;; lr  use       2 [sp] 8 [s0] 64 [arg] 65 [frame] 165 166 173
  ;; lr  def       0 [zero] 1 [ra] 3 [gp] 4 [tp] 5 [t0] 6 [t1] 7 [t2] 10 [a0] 
11 [a1] 12 [a2] 13 [a3] 14 [a4] 15 [a5] 16 [a6] 17 [a7] 28 [t3] 29 [t4] 30 [t5] 
31 [t6] 32 [ft0] 33 [ft1] 34 [ft2] 35 [ft3] 36 [ft4] 37 [ft5] 38 [ft6] 39 [ft7] 
40 [fs0] 41 [fs1] 42 [fa0] 43 [fa1] 44 [fa2] 45 [fa3] 46 [fa4] 47 [fa5] 48 
[fa6] 49 [fa7] 50 [fs2] 51 [fs3] 52 [fs4] 53 [fs5] 54 [fs6] 55 [fs7] 56 [fs8] 
57 [fs9] 58 [fs10] 59 [fs11] 60 [ft8] 61 [ft9] 62 [ft10] 63 [ft11] 66 [vl] 67 
[vtype] 68 [vxrm] 69 [frm] 70 [vxsat] 71 [N/A] 72 [N/A] 73 [N/A] 74 [N/A] 75 
[N/A] 76 [N/A] 77 [N/A] 78 [N/A] 79 [N/A] 80 [N/A] 81 [N/A] 82 [N/A] 83 [N/A] 
84 [N/A] 85 [N/A] 86 [N/A] 87 [N/A] 88 [N/A] 89 [N/A] 90 [N/A] 91 [N/A] 92 
[N/A] 93 [N/A] 94 [N/A] 95 [N/A] 96 [v0] 97 [v1] 98 [v2] 99 [v3] 100 [v4] 101 
[v5] 102 [v6] 103 [v7] 104 [v8] 105 [v9] 106 [v10] 107 [v11] 108 [v12] 109 
[v13] 110 [v14] 111 [v15] 112 [v16] 113 [v17] 114 [v18] 115 [v19] 116 [v20] 117 
[v21] 118 [v22] 119 [v23] 120 [v24] 121 [v25] 122 [v26] 123 [v27] 124 [v28] 125 
[v29] 126 [v30] 127 [v31] 139 163 164 167 168 173
  ;; live  in      2 [sp] 8 [s0] 64 [arg] 65 [frame] 165 166
  ;; live  gen     10 [a0] 11 [a1] 12 [a2] 13 [a3] 139 163 164 167 168 173
  ;; live  kill    1 [ra]
  (note 16 12 18 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
  (note 18 16 110 3 NOTE_INSN_DELETED)
  (insn 110 18 111 3 (set (subreg:SI (reg:DI 173) 0)
          (reg:SI 165)) "r.c":4:14 276 {*movsi_internal}
       (expr_list:REG_DEAD (reg:SI 165)
          (nil)))
  (insn 111 110 17 3 (set (subreg:SI (reg:DI 173) 4)
          (reg:SI 166 [+4 ])) "r.c":4:14 276 {*movsi_internal}
       (expr_list:REG_DEAD (reg:SI 166 [+4 ])
          (nil)))
  (insn 17 111 112 3 (set (reg/f:SI 139)
          (high:SI (symbol_ref:SI ("*.LANCHOR0") [flags 0x182]))) "r.c":4:14 
276 {*movsi_internal}
       (nil))
  (insn 112 17 19 3 (set (reg:DF 10 a0)
          (subreg:DF (reg:DI 173) 0)) "r.c":4:14 287 {*movdf_softfloat}
       (expr_list:REG_DEAD (reg:DI 173)
          (nil)))

Therefore, I think that removing the generation of CLOBBERs in lower-subreg.cc 
is inappropriate.
Even though it may not have side effects due to certain backend optimizations 
in some architectures,
it poses a risk to what the lower-subreg pass is designed to accomplish.

BR,
Xianmiao

Reply via email to