Re: [PATCH 1/2] LoongArch: Fix wrong code with _alsl_reversesi_extended

Richard Sandiford Fri, 24 Jan 2025 03:45:02 -0800

Lulu Cheng <chengl...@loongson.cn> writes:
> 在 2025/1/24 下午3:58, Richard Sandiford 写道:
>> Lulu Cheng <chengl...@loongson.cn> writes:
>>> 在 2025/1/22 上午8:49, Xi Ruoyao 写道:
>>>> The second source register of this insn cannot be the same as the
>>>> destination register.
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>>    * config/loongarch/loongarch.md
>>>>    (<optab>_alsl_reversesi_extended): Add '&' to the destination
>>>>    register constraint and append '0' to the first source register
>>>>    constraint to indicate the destination register cannot be same
>>>>    as the second source register, and change the split condition to
>>>>    reload_completed so that the insn will be split only after RA in
>>>>    order to obtain allocated registers that satisfy the above
>>>>    constraints.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>>    * gcc.target/loongarch/bitwise-shift-reassoc-clobber.c: New
>>>>    test.
>>>> ---
>>>>    gcc/config/loongarch/loongarch.md             |  6 +++---
>>>>    .../loongarch/bitwise-shift-reassoc-clobber.c | 21 +++++++++++++++++++
>>>>    2 files changed, 24 insertions(+), 3 deletions(-)
>>>>    create mode 100644 
>>>> gcc/testsuite/gcc.target/loongarch/bitwise-shift-reassoc-clobber.c
>>>>
>>>> diff --git a/gcc/config/loongarch/loongarch.md 
>>>> b/gcc/config/loongarch/loongarch.md
>>>> index 223e2b9f37f..1392325038c 100644
>>>> --- a/gcc/config/loongarch/loongarch.md
>>>> +++ b/gcc/config/loongarch/loongarch.md
>>>> @@ -3160,13 +3160,13 @@ (define_insn_and_split 
>>>> "<optab>_shift_reverse<X:mode>"
>>>>    ;; add.w => alsl.w, so implement slli.d + and + add.w => and + alsl.w on
>>>>    ;; our own.
>>>>    (define_insn_and_split "<optab>_alsl_reversesi_extended"
>>>> -  [(set (match_operand:DI 0 "register_operand" "=r")
>>>> +  [(set (match_operand:DI 0 "register_operand" "=&r")
>>>>            (sign_extend:DI
>>>>              (plus:SI
>>>>                (subreg:SI
>>>>                  (any_bitwise:DI
>>>>                    (ashift:DI
>>>> -            (match_operand:DI 1 "register_operand" "r")
>>>> +            (match_operand:DI 1 "register_operand" "r0")
>>>>                      (match_operand:SI 2 "const_immalsl_operand" ""))
>>>>                    (match_operand:DI 3 "const_int_operand" "i"))
>>>>                  0)
>>>> @@ -3175,7 +3175,7 @@ (define_insn_and_split 
>>>> "<optab>_alsl_reversesi_extended"
>>>>       && loongarch_reassoc_shift_bitwise (<is_and>, operands[2], 
>>>> operands[3],
>>>>                                           SImode)"
>>>>      "#"
>>>> -  "&& true"
>>>> +  "&& reload_completed"
>>> I have no problem with this patch.
>>>
>>> But, I have always been confused about the use of reload_completed.
>>>
>>> I can understand that it needs to be true here, but I don't quite
>>> understand the following:
>>>
>>> ```
>>>
>>> (define_insn_and_split "*zero_extendsidi2_internal"
>>>     [(set (match_operand:DI 0 "register_operand" "=r,r,r,r")
>>>           (zero_extend:DI (match_operand:SI 1 "nonimmediate_operand"
>>> "r,m,ZC,k")))]
>>>     "TARGET_64BIT"
>>>     "@
>>>      bstrpick.d\t%0,%1,31,0
>>>      ld.wu\t%0,%1
>>>      #
>>>      ldx.wu\t%0,%1"
>>>     "&& reload_completed
>>>      && MEM_P (operands[1])
>>>      && (loongarch_14bit_shifted_offset_address_p (XEXP (operands[1], 0),
>>> SImode)
>>>          && !loongarch_12bit_offset_address_p (XEXP (operands[1], 0),
>>> SImode))
>>>      && !paradoxical_subreg_p (operands[0])"
>>>     [(set (match_dup 3) (match_dup 1))
>>>      (set (match_dup 0)
>>>           (ior:DI (zero_extend:DI
>>>                     (subreg:SI (match_dup 0) 0))
>>>                   (match_dup 2)))]
>>>     {
>>>       operands[1] = gen_lowpart (SImode, operands[1]);
>>>       operands[3] = gen_lowpart (SImode, operands[0]);
>>>       operands[2] = const0_rtx;
>>>     }
>>>     [(set_attr "move_type" "arith,load,load,load")
>>>      (set_attr "mode" "DI")])
>>> ```
>>>
>>> What is the role of reload_complete here?
>> Yeah, I agree it looks odd.  In particular, operands[0] should never be
>> a subreg after RA, so the paradoxical_subreg_p test shouldn't be needed.
>> And the hard-coded (subreg:SI ... 0) in the expansion pattern doesn't
>> seem correct for hard registers -- it should be folded down to a single
>> (reg:SI ...) instead, as for operands[3].
>>
>> Thanks,
>> Richard
>
> Now I have a very vague idea of when reload_completed needs to be judged
>
> in the split stage and when it does not need to be judged.:-(
>
> Could you please give me some guidance?


Two of the main uses of reload_completed in splits that I know of are:

(1) Splitting an instruction that has multiple alternatives, in cases
    where the choice between splitting and not splitting depends on
    the register allocation.  An aarch64 example of this is:

    (define_insn_and_split "aarch64_simd_mov_from_<mode>low"
      [(set (match_operand:<VHALF> 0 "register_operand")
            (vec_select:<VHALF>
              (match_operand:VQMOV_NO2E 1 "register_operand")
              (match_operand:VQMOV_NO2E 2 "vect_par_cnst_lo_half")))]
      "TARGET_FLOAT"
      {@ [ cons: =0 , 1 ; attrs: type   , arch      ]
         [ w        , w ; mov_reg       , simd      ] #
         [ ?r       , w ; neon_to_gp<q> , base_simd ] umov\t%0, %1.d[0]
         [ ?r       , w ; f_mrc         , *         ] fmov\t%0, %d1
      }
      "&& reload_completed && aarch64_simd_register (operands[0], <VHALF>mode)"
      [(set (match_dup 0) (match_dup 1))]
      {
        operands[1] = aarch64_replace_reg_mode (operands[1], <VHALF>mode);
      }
      [(set_attr "length" "4")]
    )

    Here, we want to split the first alternative (the one where the
    destination is a SIMD register), but we don't know until after RA
    whether the destination is a SIMD register.

(2) Splitting an instruction that the RA finds easier to allocate when
    unsplit.  A common instance of this is multiword moves.  An aarch64
    example is:

    (define_split
      [(set (match_operand:VSTRUCT_2QD 0 "register_operand")
            (match_operand:VSTRUCT_2QD 1 "register_operand"))]
      "TARGET_FLOAT && reload_completed"
      [(const_int 0)]
    {
      aarch64_simd_emit_reg_reg_move (operands, <VSTRUCT_ELT>mode, 2);
      DONE;
    })

    In particular, the unsplit form allows input and output registers to
    overlap.  The RA would not allow overlap if the instructions were
    split before RA (since the RA doesn't track the liveness of individual
    SIMD registers in multi-register tuples).

    This might become less of an issue in future, if the RA does become
    able to track the liveness of individual registers in a multi-register
    value.

There'll be other uses too, though.

Richard

Re: [PATCH 1/2] LoongArch: Fix wrong code with _alsl_reversesi_extended

Reply via email to