On Tue, 2025-01-21 at 20:34 +0800, Lulu Cheng wrote: > > 在 2025/1/21 下午6:05, Xi Ruoyao 写道: > > On Tue, 2025-01-21 at 16:41 +0800, Lulu Cheng wrote: > > > 在 2025/1/21 下午12:59, Xi Ruoyao 写道: > > > > On Tue, 2025-01-21 at 11:46 +0800, Lulu Cheng wrote: > > > > > 在 2025/1/18 下午7:33, Xi Ruoyao 写道: > > > > > /* snip */ > > > > > > ;; This code iterator allows unsigned and signed division to be > > > > > > generated > > > > > > ;; from the same template. > > > > > > @@ -3083,39 +3084,6 @@ (define_expand "rotl<mode>3" > > > > > > } > > > > > > }); > > > > > > > > > > > > -;; The following templates were added to generate "bstrpick.d + > > > > > > alsl.d" > > > > > > -;; instruction pairs. > > > > > > -;; It is required that the values of const_immalsl_operand and > > > > > > -;; immediate_operand must have the following correspondence: > > > > > > -;; > > > > > > -;; (immediate_operand >> const_immalsl_operand) == 0xffffffff > > > > > > - > > > > > > -(define_insn "zero_extend_ashift" > > > > > > - [(set (match_operand:DI 0 "register_operand" "=r") > > > > > > - (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r") > > > > > > - (match_operand 2 "const_immalsl_operand" "")) > > > > > > - (match_operand 3 "immediate_operand" "")))] > > > > > > - "TARGET_64BIT > > > > > > - && ((INTVAL (operands[3]) >> INTVAL (operands[2])) == > > > > > > 0xffffffff)" > > > > > > - "bstrpick.d\t%0,%1,31,0\n\talsl.d\t%0,%0,$r0,%2" > > > > > > - [(set_attr "type" "arith") > > > > > > - (set_attr "mode" "DI") > > > > > > - (set_attr "insn_count" "2")]) > > > > > > - > > > > > > -(define_insn "bstrpick_alsl_paired" > > > > > > - [(set (match_operand:DI 0 "register_operand" "=&r") > > > > > > - (plus:DI > > > > > > - (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r") > > > > > > - (match_operand 2 "const_immalsl_operand" > > > > > > "")) > > > > > > - (match_operand 3 "immediate_operand" "")) > > > > > > - (match_operand:DI 4 "register_operand" "r")))] > > > > > > - "TARGET_64BIT > > > > > > - && ((INTVAL (operands[3]) >> INTVAL (operands[2])) == > > > > > > 0xffffffff)" > > > > > > - "bstrpick.d\t%0,%1,31,0\n\talsl.d\t%0,%0,%4,%2" > > > > > > - [(set_attr "type" "arith") > > > > > > - (set_attr "mode" "DI") > > > > > > - (set_attr "insn_count" "2")]) > > > > > > - > > > > > Hi, > > > > > > > > > > In LoongArch, the microarchitecture has performed instruction fusion > > > > > on > > > > > bstrpick.d+alsl.d. > > > > > > > > > > This modification may cause the two instructions to not be close > > > > > together. > > > > > > > > > > So I think these two templates cannot be deleted. I will test the > > > > > impact > > > > > of this patch on the spec today. > > > > Oops. I guess we can salvage it with TARGET_SCHED_MACRO_FUSION_P and > > > > TARGET_SCHED_MACRO_FUSION_PAIR_P. And I'd like to know more details: > > > > > > > > 1. Is the fusion applying to all bstrpick.d + alsl.d, or only bstrpick.d > > > > rd, rs, 31, 0? > > > > 2. Is the fusion also applying to bstrpick.d + slli.d, or we really have > > > > to write the strange "alsl.d rd, rs, r0, shamt" instruction? > > > > > > > Currently, command fusion can only be done in the following situations: > > > > > > bstrpick.d rd, rs, 31, 0 + alsl.d rd1,rj,rk,shamt and "rd = rj" > > So the easiest solution seems just adding the two patterns back, I'm > > bootstrapping and regtesting the patch attached. > > It seems to be more formal through TARGET_SCHED_MACRO_FUSION_P and > > TARGET_SCHED_MACRO_FUSION_PAIR_P. I found the spec test item that generated > > this instruction pair. I implemented these two hooks to see if it works.
And another problem is w/o bstrpick_alsl_paired some test cases are regressed, like: struct Pair { unsigned long a, b; }; struct Pair test (struct Pair p, long x, long y) { p.a &= 0xffffffff; p.a <<= 2; p.a += x; p.b &= 0xffffffff; p.b <<= 2; p.b += x; return p; } in GCC 13 the result is: or $r12,$r4,$r0 bstrpick.d $r4,$r12,31,0 alsl.d $r4,$r4,$r6,2 or $r12,$r5,$r0 bstrpick.d $r5,$r12,31,0 alsl.d $r5,$r5,$r6,2 jr $r1 But now: addi.w $r12,$r0,-4 # 0xfffffffffffffffc lu32i.d $r12,0x3 slli.d $r5,$r5,2 slli.d $r4,$r4,2 and $r5,$r5,$r12 and $r4,$r4,$r12 add.d $r4,$r4,$r6 add.d $r5,$r5,$r6 jr $r1 While both are suboptimial, the new code generation is more stupid. I'm still unsure how to fix it, so maybe for now we'd just restore bstrpick_alsl_paired to fix the regression. And I guess we'd need zero_extend_ashift anyway because we need to use alsl.d instead of slli.d for the fusion. -- Xi Ruoyao <xry...@xry111.site> School of Aerospace Science and Technology, Xidian University