Re: [PATCH V2] RISC-V: Enhance RVV VLA SLP auto-vectorization with decompress operation

Jeff Law via Gcc-patches Mon, 12 Jun 2023 12:42:50 -0700



On 6/12/23 09:11, juzhe.zh...@rivai.ai wrote:

From: Juzhe-Zhong <juzhe.zh...@rivai.ai>

According to RVV ISA:
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc

We can enhance VLA SLP auto-vectorization with (16.5.1. Synthesizing 
vdecompress)
Decompress operation.

Case 1 (nunits = POLY_INT_CST [16, 16]):
_48 = VEC_PERM_EXPR <_37, _35, { 0, POLY_INT_CST [16, 16], 1, POLY_INT_CST [17, 
16], 2, POLY_INT_CST [18, 16], ... }>;
We can optimize such VLA SLP permuation pattern into:
_48 = vdecompress (_37, _35, mask = { 0, 1, 0, 1, ... };

Case 2 (nunits = POLY_INT_CST [16, 16]):
_23 = VEC_PERM_EXPR <_46, _44, { POLY_INT_CST [1, 1], POLY_INT_CST [3, 3], 
POLY_INT_CST [2, 1], POLY_INT_CST [4, 3], POLY_INT_CST [3, 1], POLY_INT_CST [5, 3], 
... }>;
We can optimize such VLA SLP permuation pattern into:
_48 = vdecompress (slidedown(_46, 1/2 nunits), slidedown(_44, 1/2 nunits), mask 
= { 0, 1, 0, 1, ... };

For example:
void __attribute__ ((noinline, noclone))
vec_slp (uint64_t *restrict a, uint64_t b, uint64_t c, int n)
{
   for (int i = 0; i < n; ++i)
     {
       a[i * 2] += b;
       a[i * 2 + 1] += c;
     }
}

ASM:
...
         vid.v   v0
         vand.vi v0,v0,1
         vmseq.vi        v0,v0,1  ===> mask = { 0, 1, 0, 1, ... }
vdecompress:
         viota.m v3,v0
         vrgather.vv     v2,v1,v3,v0.t
Loop:
         vsetvli zero,a5,e64,m1,ta,ma
         vle64.v v1,0(a0)
         vsetvli a6,zero,e64,m1,ta,ma
         vadd.vv v1,v2,v1
         vsetvli zero,a5,e64,m1,ta,ma
         mv      a5,a3
         vse64.v v1,0(a0)
         add     a3,a3,a1
         add     a0,a0,a2
         bgtu    a5,a4,.L4


gcc/ChangeLog:

         * config/riscv/riscv-v.cc (emit_vlmax_decompress_insn): New function.
         (shuffle_decompress_patterns): New function.
         (expand_vec_perm_const_1): Add decompress optimization.

gcc/testsuite/ChangeLog:

         * gcc.target/riscv/rvv/autovec/partial/slp-8.c: New test.
         * gcc.target/riscv/rvv/autovec/partial/slp-9.c: New test.
         * gcc.target/riscv/rvv/autovec/partial/slp_run-8.c: New test.
         * gcc.target/riscv/rvv/autovec/partial/slp_run-9.c: New test.

I've been wanting to get inside expand_vec_perm_const to see whatopportunities might exist to improve code in there. We had good successmining this space at a prior employer. While we had a lot of weirdidioms and costs to consider it was well worth the time.


So quite happy to see you diving into this code.

OK for the trunk,
Jeff

Re: [PATCH V2] RISC-V: Enhance RVV VLA SLP auto-vectorization with decompress operation

Reply via email to