> If we encounter a uarch where the other sequence is better, then I think
> we can do something like query costs or the like and select between the
> approaches -- but no need to do that now.
> So OK for the trunk.
Thanks, patch will be committed soon.
------------------ Original ------------------
From:
"Jeff Law"
<[email protected]>;
Date: Sat, Aug 12, 2023 07:02 AM
To: "Lehua
Ding"<[email protected]>;"gcc-patches"<[email protected]>;
Cc: "juzhe.zhong"<[email protected]>;"kito.cheng"<[email protected]>;"rdapp.gcc"<[email protected]>;"palmer"<[email protected]>;
Subject: Re: [PATCH] RISC-V: Revert the convert from vmv.s.x to vmv.v.i
On 8/11/23 03:01, Lehua Ding wrote:
> Hi,
>
> This patch revert the convert from vmv.s.x to vmv.v.i and add new pattern
> optimize the special case when the scalar operand is zero.
>
> Currently, the broadcast pattern where the scalar operand is a imm
> will be converted to vmv.v.i from vmv.s.x and the mask operand will be
> converted from 00..01 to 11..11. There are some advantages and
> disadvantages before and after the conversion after discussing
> with Juzhe offline and we chose not to do this transform.
>
> Before:
>
> Advantages: The vsetvli info required by vmv.s.x has
better compatibility since
> vmv.s.x only required SEW and VLEN be zero or one. That
mean there
> is more opportunities to combine with other vsetlv infos
in vsetvl pass.
>
> Disadvantages: For non-zero scalar imm, one more `li rd,
imm` instruction
> will be needed.
>
> After:
>
> Advantages: No need `li rd, imm` instruction since
vmv.v.i support imm operand.
>
> Disadvantages: Like before's advantages. Worse
compatibility leads to more
> vsetvl instrunctions need.
>
> Consider the bellow C code and asm after autovec.
> there is an extra insn (vsetivli zero, 1, e32, m1, ta, ma)
> after converted vmv.s.x to vmv.v.i.
>
> ```
> int foo1(int* restrict a, int* restrict b, int *restrict c, int n) {
> int sum = 0;
> for (int i = 0; i < n; i++)
> sum += a[i] * b[i];
>
> return sum;
> }
> ```
>
> asm (Before):
>
> ```
> foo1:
>
ble a3,zero,.L7
> vsetvli
a2,zero,e32,m1,ta,ma
> vmv.v.i v1,0
> .L6:
> vsetvli
a5,a3,e32,m1,tu,ma
>
slli a4,a5,2
>
sub a3,a3,a5
> vle32.v v2,0(a0)
> vle32.v v3,0(a1)
>
add a0,a0,a4
>
add a1,a1,a4
>
vmacc.vv v1,v3,v2
>
bne a3,zero,.L6
> vsetvli
a2,zero,e32,m1,ta,ma
> vmv.s.x v2,zero
>
vredsum.vs v1,v1,v2
> vmv.x.s a0,v1
> ret
> .L7:
>
li a0,0
> ret
> ```
>
> asm (After):
>
> ```
> foo1:
>
ble a3,zero,.L4
> vsetvli
a2,zero,e32,m1,ta,ma
> vmv.v.i v1,0
> .L3:
> vsetvli
a5,a3,e32,m1,tu,ma
>
slli a4,a5,2
>
sub a3,a3,a5
> vle32.v v2,0(a0)
> vle32.v v3,0(a1)
>
add a0,a0,a4
>
add a1,a1,a4
>
vmacc.vv v1,v3,v2
>
bne a3,zero,.L3
>
vsetivli zero,1,e32,m1,ta,ma
> vmv.v.i v2,0
> vsetvli
a2,zero,e32,m1,ta,ma
>
vredsum.vs v1,v1,v2
> vmv.x.s a0,v1
> ret
> .L4:
>
li a0,0
> ret
> ```
>
> Best,
> Lehua
>
> Co-Authored-By: Ju-Zhe Zhong <[email protected]>
>
> gcc/ChangeLog:
>
>* config/riscv/predicates.md (vector_const_0_operand): New.
>* config/riscv/vector.md (*pred_broadcast<mode>_zero): Ditto.
>
> gcc/testsuite/ChangeLog:
>
>* gcc.target/riscv/rvv/base/scalar_move-5.c: Update.
>* gcc.target/riscv/rvv/base/scalar_move-6.c: Ditto.
If we encounter a uarch where the other sequence is better, then I think
we can do something like query costs or the like and select between the
approaches -- but no need to do that now.
So OK for the trunk.
jeff