On Tue, 2025-01-21 at 20:34 +0800, Lulu Cheng wrote:
> 
> 在 2025/1/21 下午6:05, Xi Ruoyao 写道:
> > On Tue, 2025-01-21 at 16:41 +0800, Lulu Cheng wrote:
> > > 在 2025/1/21 下午12:59, Xi Ruoyao 写道:
> > > > On Tue, 2025-01-21 at 11:46 +0800, Lulu Cheng wrote:
> > > > > 在 2025/1/18 下午7:33, Xi Ruoyao 写道:
> > > > > /* snip */
> > > > > >     ;; This code iterator allows unsigned and signed division to be 
> > > > > > generated
> > > > > >     ;; from the same template.
> > > > > > @@ -3083,39 +3084,6 @@ (define_expand "rotl<mode>3"
> > > > > >           }
> > > > > >       });
> > > > > >     
> > > > > > -;; The following templates were added to generate "bstrpick.d + 
> > > > > > alsl.d"
> > > > > > -;; instruction pairs.
> > > > > > -;; It is required that the values of const_immalsl_operand and
> > > > > > -;; immediate_operand must have the following correspondence:
> > > > > > -;;
> > > > > > -;; (immediate_operand >> const_immalsl_operand) == 0xffffffff
> > > > > > -
> > > > > > -(define_insn "zero_extend_ashift"
> > > > > > -  [(set (match_operand:DI 0 "register_operand" "=r")
> > > > > > -   (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
> > > > > > -                      (match_operand 2 "const_immalsl_operand" ""))
> > > > > > -           (match_operand 3 "immediate_operand" "")))]
> > > > > > -  "TARGET_64BIT
> > > > > > -   && ((INTVAL (operands[3]) >> INTVAL (operands[2])) == 
> > > > > > 0xffffffff)"
> > > > > > -  "bstrpick.d\t%0,%1,31,0\n\talsl.d\t%0,%0,$r0,%2"
> > > > > > -  [(set_attr "type" "arith")
> > > > > > -   (set_attr "mode" "DI")
> > > > > > -   (set_attr "insn_count" "2")])
> > > > > > -
> > > > > > -(define_insn "bstrpick_alsl_paired"
> > > > > > -  [(set (match_operand:DI 0 "register_operand" "=&r")
> > > > > > -   (plus:DI
> > > > > > -     (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
> > > > > > -                        (match_operand 2 "const_immalsl_operand" 
> > > > > > ""))
> > > > > > -             (match_operand 3 "immediate_operand" ""))
> > > > > > -     (match_operand:DI 4 "register_operand" "r")))]
> > > > > > -  "TARGET_64BIT
> > > > > > -   && ((INTVAL (operands[3]) >> INTVAL (operands[2])) == 
> > > > > > 0xffffffff)"
> > > > > > -  "bstrpick.d\t%0,%1,31,0\n\talsl.d\t%0,%0,%4,%2"
> > > > > > -  [(set_attr "type" "arith")
> > > > > > -   (set_attr "mode" "DI")
> > > > > > -   (set_attr "insn_count" "2")])
> > > > > > -
> > > > > Hi,
> > > > > 
> > > > > In LoongArch, the microarchitecture has performed instruction fusion 
> > > > > on
> > > > > bstrpick.d+alsl.d.
> > > > > 
> > > > > This modification may cause the two instructions to not be close 
> > > > > together.
> > > > > 
> > > > > So I think these two templates cannot be deleted. I will test the 
> > > > > impact
> > > > > of this patch on the spec today.
> > > > Oops.  I guess we can salvage it with TARGET_SCHED_MACRO_FUSION_P and
> > > > TARGET_SCHED_MACRO_FUSION_PAIR_P.  And I'd like to know more details:
> > > > 
> > > > 1. Is the fusion applying to all bstrpick.d + alsl.d, or only bstrpick.d
> > > > rd, rs, 31, 0?
> > > > 2. Is the fusion also applying to bstrpick.d + slli.d, or we really have
> > > > to write the strange "alsl.d rd, rs, r0, shamt" instruction?
> > > > 
> > > Currently, command fusion can only be done in the following situations:
> > > 
> > > bstrpick.d rd, rs, 31, 0 + alsl.d rd1,rj,rk,shamt and "rd = rj"
> > So the easiest solution seems just adding the two patterns back, I'm
> > bootstrapping and regtesting the patch attached.
> 
> It seems to be more formal through TARGET_SCHED_MACRO_FUSION_P and
> 
> TARGET_SCHED_MACRO_FUSION_PAIR_P. I found the spec test item that generated
> 
> this instruction pair. I implemented these two hooks to see if it works.

And another problem is w/o bstrpick_alsl_paired some test cases are
regressed, like:

struct Pair { unsigned long a, b; };

struct Pair
test (struct Pair p, long x, long y)
{
  p.a &= 0xffffffff;
  p.a <<= 2;
  p.a += x;
  p.b &= 0xffffffff;
  p.b <<= 2;
  p.b += x;
  return p;
}

in GCC 13 the result is:

        or      $r12,$r4,$r0
        bstrpick.d      $r4,$r12,31,0
        alsl.d  $r4,$r4,$r6,2
        or      $r12,$r5,$r0
        bstrpick.d      $r5,$r12,31,0
        alsl.d  $r5,$r5,$r6,2
        jr      $r1

But now:

        addi.w  $r12,$r0,-4                     # 0xfffffffffffffffc
        lu32i.d $r12,0x3
        slli.d  $r5,$r5,2
        slli.d  $r4,$r4,2
        and     $r5,$r5,$r12
        and     $r4,$r4,$r12
        add.d   $r4,$r4,$r6
        add.d   $r5,$r5,$r6
        jr      $r1

While both are suboptimial, the new code generation is more stupid.  I'm
still unsure how to fix it, so maybe for now we'd just restore
bstrpick_alsl_paired to fix the regression.

And I guess we'd need zero_extend_ashift anyway because we need to use
alsl.d instead of slli.d for the fusion.

-- 
Xi Ruoyao <xry...@xry111.site>
School of Aerospace Science and Technology, Xidian University

Reply via email to