I split patterns of vwadd/vwsub, and add dedicated patterns for them.
2. This patch not only optimize the case as above (1) mentioned, also enhance
vwadd.vv/vwsub.vv
optimization for complicate PLUS/MINUS codes, consider this following codes:
__attribute__ ((noipa)) void
vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
int16_t *__restrict dst3, int8_t *__restrict a,
int8_t *__restrict b, int8_t *__restrict a2,
int8_t *__restrict b2, int n)
{
for (int i = 0; i < n; i++)
{
dst[i] = (int16_t) a[i] + (int16_t) b[i];
dst2[i] = (int16_t) a2[i] + (int16_t) b[i];
dst3[i] = (int16_t) a2[i] + (int16_t) a[i];
}
}
Before this patch:
...
vsetvli zero,a6,e8,mf2,ta,ma
vle8.v v2,0(a3)
vle8.v v1,0(a4)
vsetvli t1,zero,e16,m1,ta,ma
vsext.vf2 v3,v2
vsext.vf2 v2,v1
vadd.vv v1,v2,v3
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v1,0(a0)
vle8.v v4,0(a5)
vsetvli t1,zero,e16,m1,ta,ma
vsext.vf2 v1,v4
vadd.vv v2,v1,v2
...
After this patch:
...
vsetvli zero,a6,e8,mf2,ta,ma
vle8.v v3,0(a4)
vle8.v v1,0(a3)
vsetvli t4,zero,e8,mf2,ta,ma
vwadd.vv v2,v1,v3
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v2,0(a0)
vle8.v v2,0(a5)
vsetvli t4,zero,e8,mf2,ta,ma
vwadd.vv v4,v3,v2
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v4,0(a1)
vsetvli t4,zero,e8,mf2,ta,ma
sub a7,a7,a6
vwadd.vv v3,v2,v1
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v3,0(a2)
...
The reason why current upstream GCC can not optimize codes using vwadd
thoroughly is combine PASS
needs intermediate RTL IR (extend one of the operand pattern (vwadd.wv)), then
base on this intermediate
RTL IR, extend the other operand to generate vwadd.vv.
So vwadd.wv/vwsub.wv definitely helps to vwadd.vv/vwsub.vv code optimizations.
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc: Change
vwadd.wv/vwsub.wv intrinsic API expander
* config/riscv/vector.md
(@pred_single_widen_<plus_minus:optab><any_extend:su><mode>): Remove it.
(@pred_single_widen_sub<any_extend:su><mode>): New pattern.
(@pred_single_widen_add<any_extend:su><mode>): New pattern.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/widen/widen-5.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-6.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-6.c: New test.