Hi, Jeff.

>> Hmm, I'm not sure this is safe, especially if gimple->rtl expansion is
>> complete.  While you might be able to get REG_EXPR, I would not really
>> expect SSA_NAME_DEF_STMT to be correct.  At the least it'll need some
>> way to make sure it's not called at an inappropriate time.
I think it's safe, if SSA_NAME_DEF_STMT is NULL, then just return it.

>> Should this have been known_lt rather than known_le?
It should be LE, since I will pass through GET_MODE_NUNITS/GET_MODE_SIZE for 
SLP.

>> Something's off in your formatting here.  I'd guess spaces vs tabs
Ok.

>>In a few places you're using expand_binop.  Those interfaces are really
>>more for gimple->RTL.  BUt code like expand_gather_scatter is really
>>RTL, not gimple/tree.   Is there a reason why you're not using pure RTL
>>interfaces?
I saw ARM SVE is using them in many places for expanding patterns.
And I think it's convenient so that's why I use them.

Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-07-12 10:01
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; rdapp.gcc
Subject: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV 
auto-vectorization
 
 
On 7/7/23 08:32, Juzhe-Zhong wrote:
> This patch fully support gather_load/scatter_store:
> 1. Support single-rgroup on both RV32/RV64.
> 2. Support indexed element width can be same as or smaller than Pmode.
> 3. Support VLA SLP with gather/scatter.
> 4. Fully tested all gather/scatter with LMUL = M1/M2/M4/M8 both VLA and VLS.
> 5. Fix bug of handling (subreg:SI (const_poly_int:DI))
> 6. Fix bug on vec_perm which is used by gather/scatter SLP.
> 
> All kinds of GATHER/SCATTER are normalized into LEN_MASK_*.
> We fully supported these 4 kinds of gather/scatter:
> 1. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and dummy 
> mask (Full vector).
> 2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and real 
> mask.
> 2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and dummy 
> mask.
> 2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and real mask.
> 
> We use vluxei/vsuxei (un-ordered indexed loads/stores of RVV to code generate 
> gather/scatter).
> 
> Also, we support strided loads/stores with vlse.v/vsse.v. Consider this 
> following case:
> #define TEST_LOOP(DATA_TYPE, BITS)                                            
>  \
>    void __attribute__ ((noinline, noclone))                                   
>   \
>    f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src, 
>   \
>   INDEX##BITS stride, INDEX##BITS n)                   \
>    {                                                                          
>   \
>      for (INDEX##BITS i = 0; i < n; ++i)                                      
>   \
>        dest[i] += src[i * stride];                                            
>   \
>    }
> 
> Codegen:
> f_int8_t_8:
> ble a3,zero,.L10
> li a5,1
> mv a4,a0
> bne a2,a5,.L4
> li a2,1
> .L6:
> vsetvli a5,a3,e8,m2,ta,ma
> vle8.v v2,0(a0)
> vlse8.v v4,0(a1),a2
> vsetvli a6,zero,e8,m2,ta,ma
> sub a3,a3,a5
> vadd.vv v2,v2,v4
> vsetvli zero,a5,e8,m2,ta,ma
> vse8.v v2,0(a4)
> add a0,a0,a5
> add a1,a1,a5
> add a4,a4,a5
> bne a3,zero,.L6
> .L10:
> ret
> 
> We use vlse.v instead of vluxei.
> 
> This patch has been tested on both RV32 and RV64.
> 
> gcc/ChangeLog:
> 
>          * config/riscv/autovec.md 
> (len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>): New pattern.
>          (len_mask_gather_load<VNX2_QHSD:mode><VNX2_QHSDI:mode>): Ditto.
>          (len_mask_gather_load<VNX4_QHSD:mode><VNX4_QHSDI:mode>): Ditto.
>          (len_mask_gather_load<VNX8_QHSD:mode><VNX8_QHSDI:mode>): Ditto.
>          (len_mask_gather_load<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto.
>          (len_mask_gather_load<VNX32_QHS:mode><VNX32_QHSI:mode>): Ditto.
>          (len_mask_gather_load<VNX64_QH:mode><VNX64_QHI:mode>): Ditto.
>          (len_mask_gather_load<mode><mode>): Ditto.
>          (len_mask_scatter_store<VNX1_QHSD:mode><VNX1_QHSDI:mode>): Ditto.
>          (len_mask_scatter_store<VNX2_QHSD:mode><VNX2_QHSDI:mode>): Ditto.
>          (len_mask_scatter_store<VNX4_QHSD:mode><VNX4_QHSDI:mode>): Ditto.
>          (len_mask_scatter_store<VNX8_QHSD:mode><VNX8_QHSDI:mode>): Ditto.
>          (len_mask_scatter_store<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto.
>          (len_mask_scatter_store<VNX32_QHS:mode><VNX32_QHSI:mode>): Ditto.
>          (len_mask_scatter_store<VNX64_QH:mode><VNX64_QHI:mode>): Ditto.
>          (len_mask_scatter_store<mode><mode>): Ditto.
>          * config/riscv/predicates.md (const_1_operand): New predicate.
>          (vector_gs_offset_operand): Ditto.
>          (vector_gs_scale_operand_16): Ditto.
>          (vector_gs_scale_operand_32): Ditto.
>          (vector_gs_scale_operand_64): Ditto.
>          (vector_gs_extension_operand): Ditto.
>          (vector_gs_scale_operand_16_rv32): Ditto.
>          (vector_gs_scale_operand_32_rv32): Ditto.
>          * config/riscv/riscv-protos.h (enum insn_type): Add gather/scatter.
>          (expand_gather_scatter): New function.
>          * config/riscv/riscv-v.cc (gen_const_vector_dup): Add gather/scatter.
>          (emit_vlmax_masked_store_insn): New function.
>          (emit_nonvlmax_masked_store_insn): Ditto.
>          (modulo_sel_indices): Ditto.
>          (expand_vec_perm): Fix SLP for gather/scatter.
>          (prepare_gather_scatter): New function.
>          (strided_load_store_p): Ditto.
>          (expand_gather_scatter): Ditto.
>          * config/riscv/riscv.cc (riscv_legitimize_move): Fix bug of 
> (subreg:SI (DI CONST_POLY_INT)).
>          * config/riscv/vector-iterators.md: Add gather/scatter.
>          * config/riscv/vector.md (vec_duplicate<mode>): Use "@" instead.
>          (@vec_duplicate<mode>): Ditto.
>          (@pred_indexed_<order>store<VNX16_QHS:mode><VNX16_QHSDI:mode>): Fix 
> name.
>          (@pred_indexed_<order>store<VNX16_QHSD:mode><VNX16_QHSDI:mode>): 
> Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
>          * gcc.target/riscv/rvv/rvv.exp: Add gather/scatter tests.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c: New 
> test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c: New 
> test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c: New 
> test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c: New 
> test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c: New 
> test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c: New 
> test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c: New 
> test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c: New 
> test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c: New 
> test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c: New 
> test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c: New 
> test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c: New 
> test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c: 
> New test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c: New 
> test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c: New 
> test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c: New 
> test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c: New 
> test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c: New 
> test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c: New 
> test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c: New 
> test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c: New 
> test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c: New 
> test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c: New 
> test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c: New 
> test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c: New test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c: New test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c: New test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c: New test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c: New test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c: New test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c: New test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c: New test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c: New test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c: New test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c: New 
> test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c: New 
> test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c: New 
> test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c: New 
> test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c: New 
> test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c: New 
> test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c: New 
> test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c: New 
> test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c: New 
> test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c: New 
> test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c: New 
> test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c: New 
> test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c: New 
> test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c: New 
> test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c: New 
> test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c: New 
> test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c: New 
> test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c: New 
> test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c: New 
> test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c: 
> New test.
>          * 
> gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c: New 
> test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c: New 
> test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c: New 
> test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c: New 
> test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c: 
> New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c: 
> New test.
> 
> ---
 
 
 
 
 
 
> +
> +/* Return true if it is the strided load/store.  */
> +static bool
> +strided_load_store_p (rtx vec_offset, rtx *base, rtx *step)
> +{
> +  if (const_vec_series_p (vec_offset, base, step))
> +    return true;
> +
> +  /* For strided load/store, vectorizer always generates
> +     VEC_SERIES_EXPR for vec_offset.  */
> +  tree expr = REG_EXPR (vec_offset);
> +  if (!expr || TREE_CODE (expr) != SSA_NAME)
> +    return false;
> +
> +  /* Check if it is GIMPLE like: _88 = VEC_SERIES_EXPR <0, _87>;  */
> +  gimple *def_stmt = SSA_NAME_DEF_STMT (expr);
> +  if (!def_stmt || !is_gimple_assign (def_stmt)
> +      || gimple_assign_rhs_code (def_stmt) != VEC_SERIES_EXPR)
> +    return false;
Hmm, I'm not sure this is safe, especially if gimple->rtl expansion is 
complete.  While you might be able to get REG_EXPR, I would not really 
expect SSA_NAME_DEF_STMT to be correct.  At the least it'll need some 
way to make sure it's not called at an inappropriate time.
 
 
> +
> +/* Expand LEN_MASK_{GATHER_LOAD,SCATTER_STORE}.  */
> +void
> +expand_gather_scatter (rtx *ops, bool is_load)
> +{
> +
> +  /* We use vlse.v/vsse.v instead of indexed load/store by default
> +     if it is strided load/store.
> +
> +     FIXME: vlse.v/vsse.v may not always be better than vluxei.v/vsuxei.v.
> +     We may need COST MODE to adjust it.  */
I'd be surprised if we encounter a case where vector strided will be 
worse than the equivalent vector indexed.  In the unlikely event that 
happens, we'll have to implement a suitable cost model and splat the 
stride into a vector index register.    But I wouldn't worry too much 
about it at this stage.
 
 
> +  rtx base, step;
> +  if (strided_load_store_p (vec_offset, &base, &step))
> +    {
> +      if (GET_MODE (step) != Pmode)
> + {
> +   if (CONSTANT_P (step))
> +     step = force_reg (Pmode, step);
> +   else
> +     {
> +       rtx extend_step = gen_reg_rtx (Pmode);
> +       emit_insn (gen_extend_insn (extend_step, step, Pmode,
> +   GET_MODE (step),
> +   zero_extend_p ? true : false));
> +       step = extend_step;
> +     }
What happens for a non-constant step in a mode the same size as Pmode, 
particularly in a non-optimizing compilation?  Wouldn't that abort with 
an unrecognized extension insn?
 
I'd have similar concerns with the code that handles the case 
inner_offsize < inner_vsize.
 
 
 
 
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index 5b7a17b9d34..19740c89132 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -1357,8 +1357,16 @@
>   }
>       }
>     else if (GET_MODE_BITSIZE (<VEL>mode) > GET_MODE_BITSIZE (Pmode)
> -           && immediate_operand (operands[3], Pmode))
> -    operands[3] = gen_rtx_SIGN_EXTEND (<VEL>mode, force_reg (Pmode, 
> operands[3]));
> +           && (immediate_operand (operands[3], Pmode)
> +        || (CONST_POLY_INT_P (operands[3])
> +            && known_ge (rtx_to_poly_int64 (operands[3]), 0U)
> +    && known_le (rtx_to_poly_int64 (operands[3]), GET_MODE_SIZE 
> (<MODE>mode)))))
Should this have been known_lt rather than known_le?
 
 
> @@ -1397,6 +1406,12 @@
>     (match_dup 2)))]
>     {
>       gcc_assert (can_create_pseudo_p ());
> +    if (CONST_POLY_INT_P (operands[3]))
> +      {
> +        rtx tmp = gen_reg_rtx (<VEL>mode);
> + emit_move_insn (tmp, operands[3]);
> + operands[3] = tmp;
> +      }
Something's off in your formatting here.  I'd guess spaces vs tabs
 
 
In a few places you're using expand_binop.  Those interfaces are really 
more for gimple->RTL.  BUt code like expand_gather_scatter is really 
RTL, not gimple/tree.   Is there a reason why you're not using pure RTL 
interfaces?
 
Anyway this is mostly good, but I do think there are a few outstanding 
questions/concerns to work through.
 
Jeff
 

Reply via email to