On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richard and Richi.
> 
> Currently, we are statically returning vectorization factor in 
> 'TARGET_VECTORIZE_PREFERRED_SIMD_MODE'
> according to compile option.
> 
> For example:
> void
> foo (int32_t *__restrict a, int32_t *__restrict b, int n)
> {
>   for (int i = 0; i < n; i++)
>     a[i] = a[i] + b[i];
> }
> 
> with --param=riscv-autovec-lmul = m1:
> 
> vsetvli a5,a2,e32,m1,ta,ma
> vle32.v v2,0(a0)
> vle32.v v1,0(a1)
> vsetvli a6,zero,e32,m1,ta,ma
> slli a3,a5,2
> vadd.vv v1,v1,v2
> sub a2,a2,a5
> vsetvli zero,a5,e32,m1,ta,ma
> vse32.v v1,0(a4)
> add a0,a0,a3
> add a1,a1,a3
> add a4,a4,a3
> bne a2,zero,.L3
> 
> The 'vadd.vv' is only performing operations on a single register.
> 
> with --param=riscv-autovec-lmul=m8:
> 
>   vsetvli a5,a2,e8,m2,ta,ma
>   vle32.v v16,0(a0)
>   vle32.v v8,0(a1)
>   vsetvli a6,zero,e32,m8,ta,ma
>   slli a3,a5,2
>   vadd.vv v8,v8,v16
>   vsetvli zero,a2,e32,m8,ta,ma
>   sub a2,a2,a5
>   vse32.v v8,0(a4)
>   add a0,a0,a3
>   add a1,a1,a3
>   add a4,a4,a3
>   bne a2,zero,.L3
> 
> The 'vadd.vv' here is performing operations on 8 consecutive registers:
> 
> vadd.vv [v8 - v15], [v8 - v15], [v16 - v23]
> 
> Users statically set the vectorization factor is not ideal.
> 
> We want GCC to dynamic choose vectorization factor to do the 
> auto-vectorization according to loop analysis.
> 
> Currently, I have implement simplistic loop analysis like analyze live range 
> of each local decl of current function.
> 
> Here is the analysis, we have 32 vector registers for RVV.
> So we calculate the live range of current function local decl:
> 
> the number of decls live at the same time * LMUL <= 32. 
> According to this analysis, I set the vectorization factor in 
> TARGET_VECTORIZE_PREFERRED_SIMD_MODE
> 
> Then this simplistic algorithm (implemented in RISC-V backend) work well for 
> the testcases I produces.
> 
> However, I can only choose optimal vectorization for whole function but 
> failed to specific loop.
> 
> Here is the example:
> 
> void foo2 (int32_t *__restrict a,
>           int32_t *__restrict b,
>           int32_t *__restrict c,
>           int32_t *__restrict a2,
>           int32_t *__restrict b2,
>           int32_t *__restrict c2,
>           int32_t *__restrict a3,
>           int32_t *__restrict b3,
>           int32_t *__restrict c3,
>           int32_t *__restrict a4,
>           int32_t *__restrict b4,
>           int32_t *__restrict c4,
>           int32_t *__restrict a5,
>           int32_t *__restrict b5,
>           int32_t *__restrict c5,
>           int n)
> {
> // Loop 1
>     for (int i = 0; i < n; i++)
>        a[i] = a[i] + b[i];
> // Loop 2
>     for (int i = 0; i < n; i++){
>       a[i] = b[i] + c[i];
>       a2[i] = b2[i] + c2[i];
>       a3[i] = b3[i] + c3[i];
>       a4[i] = b4[i] + c4[i];
>       a5[i] = a[i] + a4[i];
>       a[i] = a3[i] + a2[i]+ a5[i];
>     }
> }
> 
> Loop 1 we can aggressively choose LMUL = 8, but Loop 2 should choose LMUL = 4 
> (since LMUL = 8 will cause vector register spillings).
> 
> If I split loop 1 and loop 2 into 2 separate functions, my algorithm works 
> well.
> 
> However, if we put these 2 loop in the same function, I finally pick LMUL = 4 
> for both loop 1 and loop 2 since as I said above, I do the analysis base on 
> function not base
> on the loop.
> 
> I am struggling whether we could have a good idea for such issue. Can we pass 
> through loop_vec_info
> to 'preferred_simd_mode' target hook?

That's not how it's currently designed to work - there's
the autovectorize_vector_modes hook where you should provide a vector
of modes the vectorizer iterates over and return VECT_COMPARE_COST
if you want to evaluate costs between choices.  Your analysis should
then happen in the finish_cost method.

That's how it's currently designed.  It might not be optimal for
compile-time reasons when there are many modes, giving the target
more control (and context) might be possible.

Richard.

> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to